Next Article in Journal
Identifying Allosteric Small-Molecule Binding Sites of Inactive NS2B-NS3 Proteases of Pathogenic Flaviviridae
Previous Article in Journal
Equine Infectious Anemia Virus Cellular Partners Along the Viral Cycle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Viral Transcript and Tumor Immune Microenvironment-Based Transcriptomic Profiling of HPV-Associated Head and Neck Squamous Cell Carcinoma Identifies Subtypes Associated with Prognosis

by
Anastasiia Nikitina
1,
Daria Kiriy
1,
Andrey Tyshevich
1,
Dmitry Tychinin
1,
Zoya Antysheva
1,
Anastasya Sobol
1,
Vladimir Kushnarev
1,
Nara Shin
1,
Jessica H. Brown
1,
James Lewis, Jr.
2,3,
Krystle A. Lang Kuhs
4,5,
Robert Ferris
6,
Lori Wirth
7,8,†,
Nikita Kotlov
1,† and
Daniel L. Faden
7,8,9,10,*,†
1
BostonGene Corp., Waltham, MA 02453, USA
2
Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
3
Department of Otolaryngology-Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, TN 37232, USA
4
Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA
5
Department of Epidemiology & Environmental Health, College of Public Health, University of Kentucky, Lexington, KY 40536, USA
6
Department of Immunology, University of Pittsburgh, Pittsburgh, PA 15213, USA
7
Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA 02115, USA
8
Massachusetts General Hospital, Boston, MA 02114, USA
9
Massachusetts Eye and Ear, Boston, MA 02114, USA
10
Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Viruses 2025, 17(1), 4; https://doi.org/10.3390/v17010004
Submission received: 10 October 2024 / Revised: 30 November 2024 / Accepted: 17 December 2024 / Published: 24 December 2024
(This article belongs to the Special Issue Immune Responses to Papillomavirus Infections: 2nd Edition)

Abstract

:
Human papillomavirus (HPV)-associated head and neck squamous cell carcinoma (HPV-positive HNSCC) has distinct biological characteristics from HPV-negative HNSCC. Using an AI-based analytical platform on meta cohorts, we profiled expression patterns of viral transcripts and HPV viral genome integration, and classified the tumor microenvironment (TME). Unsupervised clustering analysis revealed five distinct and novel TME subtypes across patients (immune-enriched, highly immune and B-cell enriched, fibrotic, immune-desert, and immune-enriched luminal). These TME subtypes were highly correlated with patient prognosis. In order to understand specific factors associated with prognosis, we used the unsupervised clustering of an HPV-positive HNSCC cohort from The Cancer Genome Atlas (TCGA) (n = 53) based on HPV transcript expression, and identified four HPV-related subtypes (E2/E5, E6/E7, E1/E4 and L1/L2). Utilizing both viral transcript and TME subtypes, we found that the E2/E5 HPV subtype was associated with an immune-enriched TME and had a higher overall survival rate compared to other subtypes. The E2/E5 subtype was also enriched for samples without HPV-genome integration, suggesting that the episomal HPV status and E2/E5 expression pattern may be associated with an inflamed microenvironment and improved prognosis. In contrast, E6/E7 subtype samples were associated with the fibrotic and immune-desert TME subtypes, with lower values of T-cell and B-cell gene expression signatures and lower overall survival. Both E1/E4 and L1/L2 subtypes were associated with the immune-enriched luminal subtype. Our results suggest that HPV-transcript expression patterns may drive the modulation of the TME, and thereby impact prognosis.

1. Introduction

Human papillomavirus (HPV)-associated head and neck squamous cell carcinoma (HPV-positive HNSCC) is the most common HPV-associated malignancy in the United States. While many targets have been evaluated spanning genomics, proteomics, transcriptomics and viromics, currently, there is a lack of validated predictive biomarkers in clinical use, which has stagnated efforts to personalize treatments [1,2,3,4]. Thus, at present, many patients receive excessive treatment with severe lifelong side effects, while others are undertreated and at risk for relapse. Here, we describe the use of a next-generation sequencing (NGS) data-based analytical platform to characterize the expression patterns of viral transcripts, the tumor microenvironment (TME), and viral genome integration, and associate these features with overall survival (OS).
Growing evidence has indicated that the TME, a complex milieu of various cell types, including immune cells, fibroblasts, and endothelial cells, plays a significant role in a tumor’s growth and progression, and modulates its response to therapy [5,6,7,8]. Given the immunogenic nature of HPV, it is hypothesized that the TME in HPV-positive HNSCCs would exhibit distinct characteristics compared to HPV-negative HNSCC [9]. Furthermore, the expression patterns of viral transcripts within the tumor could play a role in modulating the TME, and subsequently, the tumor’s behavior and prognosis [10].
In this study, we aimed to dissect the TME in HPV-positive HNSCCs using functional gene expression signatures (Fges) and explore the associations between viral transcript expression patterns, TME subtypes, and patient prognosis.

2. Materials and Methods

2.1. HPV Status Prediction Classifier

2.1.1. Data Preparation

Multiple datasets consisting of head and neck cancer samples that were available with HPV status were used to create the classifier (Table 1).
The data were obtained both using various microarray platforms and using RNA sequencing (RNA-seq). Training and testing were performed with independent datasets and both cohorts contained data from different RNA protocols.

2.1.2. TCGA Labels Preparation

The HPV status of TCGA samples was assigned based on a previous TCGA report [19]. For the samples missing HPV status in the TCGA annotation, Pathseq (https://software.broadinstitute.org/pathseq/) was used as described further in the “Determination of HPV viral status” section in the methods. Pathseq output for the following serotypes was used: Alphapapillomavirus_12, Alphapapillomavirus_4, Alphapapillomavirus_7 (=HPV18), Human_papillomavirus_5, Human_papillomavirus_9, Human_papillomavirus_type_10, Human_papillomavirus_type_16, and Human_papillomavirus_1.

2.1.3. Feature Preparation

Gene expression values were used to train the RNA classifier. To eliminate batch effects without introducing cohort-based batches, we performed rank transformation across samples. As the initial set of features, we used 158 genes described by Ankur et al., to be significantly differentially expressed between HPV-positive and HPV-negative HNSCC tumors [20].

2.1.4. Classifier Training, Tuning and Testing

As a machine learning model, we implemented the gradient boosting LGBM Classifier. Hyperparameters were chosen using 5-fold cross-validation and the F1-weighted metric was chosen for performance assessment. After the training step, the set of hyperparameters that showed the best quality on cross-validation was evaluated to determine the importance of the gene features. The feature with the lowest importance score was removed from the dataset. The subsequent training step was carried out without this gene. Iterations continued until 144 features remained. Finally, the set of features and hyperparameters was chosen that maximized the F1-weighted score. The list of features selected is listed in Supplementary Table S1. After this final feature selection step we performed one final tuning of hyperparameters. Model testing was performed on a hold-out independent dataset, which included 20% of the the TCGA HNSCC cohort and another microarray-based cohort (Table 1).

2.2. Microarray Data Processing

Raw and processed microarray data were downloaded from GEO. Expression was re-processed from raw files, if possible, using affygcRMA and oligo R packages. All affymetrix datasets with available CEL files were re-normalized using the gcRMA package with default parameters. Illumina array data were downloaded from GEO as is.

2.3. RNA-Seq Processing

RNA-seq data were processed from raw reads as described by Bagaev et al. [21]. Briefly, the reads were aligned with Kallisto v0.42.4 and annotated using GENCODEv23 transcripts 69. The expressions of 20,062 protein coding genes were quantified as transcripts per million (TPM) and log2-transformed [22].

2.4. WES Processing

Alignment: Low-quality reads were filtered using FilterByTile/BBMap v37.90 and aligned to the human reference genome GRCh38 (GRCh38.d1.vd1 assembly) using BWA v0.7.17. Duplicate reads were removed using Picard’s v2.6.0 MarkDuplicates; indels were realigned by IndelRealigner and recalibrated by BaseRecalibrator and ApplyBQSR (last three tools from GATK v3.8.1).
Variant calling: Both germline and somatic single-nucleotide variations (sSNVs), small insertions, and deletions were all detected using Strelka v2.9. All variants, insertions and deletions were annotated using Variant Effect Predictor v92.1. Copy number alterations were evaluated with a customized version of Sequenza v2.1.2. Tumor cellularity (purity) estimation was determined via purity estimations (CPE), as previously described [23].

2.5. Tumor Microenvironment (TME) Classification

We collected three publicly available HNSCC datasets from the GEO and SRA databases (GSE30784 [24], GSE40774 [18], GSE65858 [11]), one internal cohort, and the TCGA-HNSC project [19] (total number of samples = 912). Clinical and mutation data were downloaded from the GDC TCGA data portal (MC3 dataset). Transcriptomic data were downloaded from the USCS XENA portal (https://xena.ucsc.edu/) as TPM units. We excluded samples that did not pass the quality control due to one of the following reasons: PCA outlier, low correlation with others within the cohort (<0.8 for Affymetrix platforms, <0.65 for Illumina platforms), low coverage and low phred scores for the RNA-seq, high non-human tissue contamination (>3%), or high percentage of duplicates (>80%). We developed an HPV status prediction algorithm on samples that passed quality control and selected 266 HPV-positive samples from public datasets for further TME subtypes identification. Twenty expression signatures from Bagaev et al. [21] and Batista da Costa et al. [18] were calculated using ssGSEA [11]. The PI3K pathway activity score was calculated using PROGENy [25]. Raw signature scores were median-centered and mad-scaled within a platform-based batch. TME subtype classification was based on Louvain clustering [26] on scaled signature scores.

2.6. TME Validation

To validate the TME classification, two independent HNSCC cohorts from Massachusetts Eye and Ear were utilized (n = 142 and n = 54 RNA-seq samples). Using the developed HPV status-prediction algorithm, 108 and 45 HPV-positive samples from these cohorts were selected, respectively. To predict TME subtypes, we used the K-nearest neighbor algorithm on pre-selected signatures with K = 45.

2.7. Determination of HPV Viral Status, Viral Expression, Subtypes, and Host-Viral Chimeric Reads

Viral read identification was based on the GATK Pathseq software kit 2.0 [27] with quantitative assessment expressed in VRM (viral read per million human reads). Viral status and type verification were performed using the VIRTUS pipeline [28]. The threshold for determining viral status as “positive” was set at 2 VRM. At this threshold, the number of raw reads makes it possible to evaluate the expression of viral transcripts. The mapping, scoring, and quantification of viral transcripts were also analyzed by ViGEN [29] for HPV16-positive (HPV16+) samples (NCBI Reference Sequence: NC_001526). HPV16+ samples were classified into four distinct subtypes (E2/E5, E6/E7, E1/E4, L1/L2) based on unsupervised Louvain clustering [26] of the median-scaled TPM-transformed viral gene expression. Additionally, we identified viral–host chimeric reads using Vi-Fi software [30]. The presence of viral–host chimeric reads was interpreted as viral integration in the host genome.

3. Results

3.1. Development of Single-Sample Host Expression-Based HPV Status Predictor

To develop an ML-based algorithm that predicts HPV status, publicly available expression data (n = 1013 HNSCC samples) were used for training (n = 799) and validation (n = 214) (Figure 1A, Table 1). The initial set of features for model development was composed of 158 genes distinguishing between HPV-positive and HPV-negative tumors according to the literature review [20]. To eliminate the platform batch effect, we included both RNA-seq and array platforms in training and validation sample sets, and used the rank-transformation of expression values for a selected set of genes. Using the ML-based feature selection algorithm, we subsequently reduced the number of features to 144 genes (Supplementary Table S1). After the final step of hyperparameter tuning on this feature set, the HPV status classifier was validated on an independent set of samples (n = 214), and showed high accuracy in predicting the HPV status for a single sample (ROC-AUC = 0.985) (Figure 1B).
In addition, the model was validated using a cohort of HNSCC samples from the clinical trial NCT03238365 [31]. Only pre-treatment samples were included to avoid the influence of the treatment on the prediction (n = 37). The HPV status classifier achieved a weighted F1-score of 83.4%, correctly predicting the HPV status for 31 of 36 samples. Using this cohort, we demonstrated the clear separation of samples into predicted HPV-positive and HPV-negative groups based on expression levels of the top 25 model genes having the highest feature importance metric (Figure 1C). Then, gene set overlap analysis was applied using gene collections (v6.1) from MsigDB [32] separated into 16 HPV-high and 9 HPV-low genes (correlating with high or low probability of HPV-driven HNSCC, respectively) (Figure 1D). HPV-high genes were enriched with gene sets of proliferation, mitotic spindle, and cell cycle, and included known genes such as CDKN2A and CDKN2B. HPV-low genes did not overlap significantly with any of the gene sets tested.

3.2. HPV-Positive HNSCC Tumors Can Be Characterized by Five Distinct Tumor Microenvironment Subtypes

To characterize the TME, our HPV status prediction algorithm was used, and 266 HPV-positive samples from publicly available expression datasets were selected (Figure 2A). We generated 20 Fges representing various immune populations (e.g., Treg cells, B cells, effector cells), stromal components (e.g., angiogenesis, cancer-associated fibroblasts (CAFs)), and tumor properties (e.g., proliferation rate, basal/keratinization). For each of the 266 HPV-positive samples, 19 signature activity scores were calculated using ssGSEA and PI3K pathway activity scores. Further applying unsupervised dense Louvain clustering [26] to these scores revealed five subtypes characterized by distinct TME composition and tumor features. Based on enriched signatures, they were termed moderately immune-enriched (IE/M), immune-enriched B cell (IE/B), fibrotic (F), immune-desert (D), and immune-enriched luminal-like (IE/L) (Figure 2B). Each subtype was characterized by a distinct signature enrichment pattern. For example, there was a high CAF signature in fibrotic TMEs and a high PI3K pathway score in immune-desert TMEs (Figure 2C). While all three immune-enriched subtypes showed relatively higher signature values compared to the other TMEs, samples in the IE/B subtype were enriched in B cell and follicular helper T cell (Tfh) signatures. Further, samples in the IE/L subtype were distinguished from other inflamed tumors by a very low basal/keratinization signature (Figure 2C). There were no associations between the tumor site and the TME subtype (Figure S1A). Importantly, TME subtypes were associated with OS and patient prognosis. Tumors with an immune-enriched microenvironment showed the highest survival rates, whereas patients with a fibrotic TME subtype had poor survival (Figure 2D). The independent validation of the TME subtypes was performed on an internal HPV-positive cohort (n = 45) (Figure 2E and Figure S1B).

3.3. HPV Transcript Expression Stratifies HPV-Positive HNSCCs into Four Subtypes Associated with Prognosis

To further elucidate the specific factors associated with the survival of HPV-positive HNSCC patients, patterns of HPV-specific transcript expression were investigated. By retrieving viral transcript expression data from bulk RNA-seq and using the unsupervised clustering of viral transcript expression values in the TCGA HNSCC HPV-positive dataset, four HPV-related subtypes were identified. Each subtype was enriched in distinct viral transcripts, expressed at different life cycle stages: E2/E5, E6/E7, E1/E4 and L1/L2 [33] (Figure 3A). The survival analysis of these subtypes and HPV-negative samples showed that the E6/E7 cluster was associated with the worst OS among HPV-positive samples, approaching HPV-negative tumors. The best prognosis was in patients belonging to the E2/E5 viral expression subtype (Figure 3B). To further investigate patterns of viral gene expression in HPV-positive squamous cell carcinoma (SCC), we performed the same clustering analysis using the TCGA cervical SCC (CESC) dataset and discovered the same four viral subtypes as for HNSCC (Figure 3C). Patients with the E6/E7 viral subtype of CESC had statistically indistinguishable progression-free survival (PFS) from HPV-negative CESC samples, and the poorest prognosis among all HPV-positive subtypes (Figure 3D). Prognostic stratification of all viral subtypes in CESC closely resembled that for HNSCC, with the E2/E5 cluster associated with the best prognosis (OS in HNSCC and PFS in CESC). The independent validation of HPV subtypes was performed on an internal HPV-positive cohort (n = 108) and revealed similar viral subtypes (Figure S1C,D).
The detection of viral–host mRNA fusions showed that the E2/E5 subtype was enriched for samples without HPV–genome integration, suggesting that HPV episomal DNA status and an E2/E5 expression pattern may drive an inflamed microenvironment and improved prognosis (Figure 3E). These findings were validated on CESC TCGA samples (Figure 3F).

3.4. E2/E5 HPV Subtype Associates with Immune-Enriched Subtypes

Utilizing both viral transcript and TME subtypes of the TCGA-HNSCC cohort, we found that the E2/E5 HPV subtype was associated with an immune-enriched TME (74%) (Figure 4A and Figure S1E). This finding is in concordance with the higher OS of E2/E5 and immune-enriched samples (IE/M, IE/B, IE/L) compared to the other subtypes. On the contrary, the E6/E7 subtype was associated with an immune-desert TME (50%) and comprised more samples with fibrotic TMEs (28%) than any other viral subtype. Among all HPV clusters, the E6/E7 subtype had the lowest levels of T and B cell gene expression signatures, resembling those of HPV-negative HNSCCs (Figure 4B). Patients with such tumors had poor prognosis according to both classifications. Interestingly, the E6/E7 subtype did not include any IE/L samples, suggesting that these HNSCCs were less differentiated and more resembled “basal” keratinocytes, which could explain the aggressiveness of such tumors. On the other hand, samples with the lowest basal signature from the IE/L subtype were mostly clustered within the L1/L2 viral subtype, consistent with the fact that L1 and L2 capsid proteins are more highly expressed toward the layers of the squamous epithelium [34]. Taking into account the higher differentiation of such keratinocytes, this finding is also in accordance with the good prognosis that was observed for patients from the IE/L and L1/L2 subtypes.

3.5. Distinct Genetic Features of Various Viral and TME Subtypes of HPV-Positive HNSCCs

Major differences in the genomic landscapes of HPV-positive and HPV-negative HNSCCs have been characterized previously, and were confirmed in this study, including lower rates of CDKN2A(p16) deletions and somatic mutations in genes TP53, MYC, CCND1, FAT1, EGFR, and CASP8 and higher rates of GSK3B, NBPF1, and TRAF3 mutations in HPV-positive tumors (Figure S2A,B). Interestingly, we revealed a statistically significant difference in PIK3CA hotspot mutations among HPV-positive and HPV-negative tumors (Figure S2C) that was previously suspected, but “data remained insufficient to establish a pattern” [35]. In this study, among all somatic mutations in the PIK3CA gene, the alterations E542K, E545K and E546K were found to be more prevalent in HPV-positive HNSCCs (p = 0.0002, n(HPV+) = 50, n(HPV−) = 391) (Figure S2D). Further, the rates of prevalent driver genomic alterations among TME and viral subtypes were compared. We observed that tumors characterized by E6/E7 HPV and D and F TME subtypes were genetically more similar to HPV-negative tumor subtypes (Figure 4C,D). For example, they more frequently harbored p16 deletions. Interestingly, these samples also had higher rates of CCND1 copy number gains, similar to HPV-negative tumors and consistent with higher CCND1 expression in HPV-negative tumors (Figure 1C). On the contrary, immune-enriched subtypes and subtypes that were associated with E2/E5 showed high rates of B2M alterations, both mutations and deletions, suggesting these tumors might rely on such alterations as a mechanism of immune escape. Another highly noticeable alteration associated with immune escape, PDL1 copy number gain is observed at high rates among most immune-enriched TME subtypes—IE/B and IE/L (Figure 4C,D). Finally, the immune-desert TME subtype was significantly enriched with KMT2D truncating mutations, whereas other subtypes harbored lower rates and mostly missense mutations in this chromatin-remodeling gene (Figure 4C).

3.6. APOBEC Activation in HPV-Positive HNSCC

APOBEC deaminases have been shown to play an important role in the mutagenesis of HPV-positive HNSCC [10,35,36]. Here, we have confirmed that APOBEC activation was higher in HPV-positive compared to HPV-negative samples using both mutational signature and gene expression signature analysis (Figure 4E). Among viral subtypes, APOBEC expression was higher in E2/E5 tumors (Figure 4F), consistently with theories that APOBEC is upregulated in response to episomal HPV as a host defense mechanism. However, in the TCGA CESC cohort, a statistically significant difference was observed only when comparing APOBEC expression in HPV-positive and HPV-negative (Figure S2E), but not among viral subtypes, and not at the AID/APOBEC mutational signature level (Figure S2E,F).

4. Discussion

HNSCC is the sixth most common cancer worldwide, and is composed of two distinct subtypes; carcinogen-driven and viral-driven. Although HPV-positive HNSCC has unique clinical and molecular characteristics from HPV-negative HNSCC, further efforts are needed to identify patient subgroups for personalized treatment. Here, we leveraged an AI-driven algorithm to understand the tumor and TME, and the contributions of HPV to tumor progression, discerning an intriguing relationship between viral transcript expression, TME subtypes, and prognosis in HPV-associated HNSCCs. The TME subtypes that we identified—immune-enriched, highly immune and B-cell enriched, fibrotic, immune-desert, and immune-enriched luminal—exhibit distinct correlations with survival and prognosis, highlighting the prognostic implications of the TME composition. Interestingly, the HPV E2/E5 transcript subtype, which was associated with an immune-enriched TME and an improved prognosis, was also enriched in tumors without HPV-genome integration. In contrast, the E6/E7 subtype was associated with a fibrotic or desert TMEs, characterized by reduced T-cell and B-cell gene expression and poorer survival. Therefore, the E6/E7 subtype might represent a more immune-resistant phenotype, which could necessitate treatment escalation or the selection of non-immune-mediating therapies.
These findings shed light on the intricate interplay between viral oncogenes, host immune response, and clinical outcomes in HPV-associated HNSCC, and underscore the potential utility of viral transcript profiling and TME characterization in prognostication and guiding personalized therapy, adding novel findings that build on and complement the existing literature in this area [1,2,3,4,10,37,38,39]. Despite the extensive and variable datasets used in this analysis, it is important to note that further validation is necessary to confirm these associations and assess their potential implications in the clinical setting. Of key importance will be expanding the size of validation cohorts to increase robustness and generalizability. Moreover, exploring the mechanistic basis of these observations could help uncover novel insights into the pathogenesis of HPV-associated HNSCC, and pave the way for new therapeutic approaches.

5. Conclusions

Our study emphasizes the importance of adopting a multi-dimensional approach to understanding tumor biology, which integrates viral genomics, host immunology, and clinical data to optimize patient care in HPV-associated HNSCC.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v17010004/s1, Figure S1: Validation of TME subtypes; Figure S2: Differential genomic events in HPV+ vs. HPV− HNSCCs; Table S1: List of features selected.

Author Contributions

Conceptualization, A.N., L.W., N.K. and D.L.F.; methodology, D.K., D.T. and A.S.; software, D.K., A.T. and Z.A.; validation, Z.A., K.A.L.K., J.L.J., R.F. and D.L.F.; formal analysis, A.N., D.K. and D.T.; investigation, D.K., A.T. and Z.A.; writing—original draft preparation, A.N.; writing—review and editing, A.N., N.S., J.H.B., K.A.L.K., J.L.J., R.F., L.W., N.K. and D.L.F.; visualization, A.S.; supervision, V.K., N.S., L.W. and N.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by BostonGene, Corp.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and code will be deposited online and made publicly available at the time of publication.

Acknowledgments

The results published here are in whole or part based upon data generated by The Cancer Genome Atlas (TCGA) managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov/.

Conflicts of Interest

L.W. has received research funding from Lilly and Novartis. D.L.F. has received research funding from BMS, Calico, Predicine, BostonGene, and Neogenomics. D.L.F. owns stocks from Illumina, Roche, and Qiagen. All BostonGene authors were employees thereof at the time the study was performed. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Näsman, A.; Holzhauser, S.; Kostopoulou, O.N.; Zupancic, M.; Ährlund-Richter, A.; Du, J.; Dalianis, T. Prognostic Markers and Driver Genes and Options for Targeted Therapy in Human-Papillomavirus-Positive Tonsillar and Base-of-Tongue Squamous Cell Carcinoma. Viruses 2021, 13, 910. [Google Scholar] [CrossRef]
  2. Krsek, A.; Baticic, L.; Sotosek, V.; Braut, T. The Role of Biomarkers in HPV-Positive Head and Neck Squamous Cell Carcinoma: Towards Precision Medicine. Diagnostics 2024, 14, 1448. [Google Scholar] [CrossRef] [PubMed]
  3. Naegele, S.; Das, D.; Hirayama, S.; Shalhout, S.Z.; Lee, H.; Richmon, J.D.; Faden, D.L. Circulating Tumor HPV DNA in Patients With Stage I and II HPV-Associated Head and Neck Cancer After Surgery. JAMA Otolaryngol. Head Neck Surg. 2024, 150, 521–523. [Google Scholar] [CrossRef] [PubMed]
  4. Lang Kuhs, K.A.; Faden, D.L.; Chen, L.; Smith, D.K.; Pinheiro, M.; Wood, C.B.; Davis, S.; Yeager, M.; Boland, J.; Cullen, M.; et al. Genetic variation within the human papillomavirus type 16 genome is associated with oropharyngeal cancer prognosis. Ann. Oncol. 2022, 33, 638–648. [Google Scholar] [CrossRef] [PubMed]
  5. Fridman, W.H.; Pagès, F.; Sautès-Fridman, C.; Galon, J. The immune contexture in human tumours: Impact on clinical outcome. Nat. Rev. Cancer 2012, 12, 298–306. [Google Scholar] [CrossRef] [PubMed]
  6. Chen, D.S.; Mellman, I. Oncology Meets Immunology: The Cancer-Immunity Cycle. Immunity 2013, 39, 1–10. [Google Scholar] [CrossRef]
  7. Liu, T.; Han, C.; Wang, S.; Fang, P.; Ma, Z.; Xu, L.; Yin, R. Cancer-associated fibroblasts: An emerging target of anti-cancer immunotherapy. J. Hematol. Oncol. 2019, 12, 86. [Google Scholar] [CrossRef] [PubMed]
  8. Schaaf, M.B.; Garg, A.D.; Agostinis, P. Defining the role of the tumor vasculature in antitumor immunity and immunotherapy. Cell Death Dis. 2018, 9, 115. [Google Scholar] [CrossRef]
  9. Lechien, J.R.; Descamps, G.; Seminerio, I.; Furgiuele, S.; Dequanter, D.; Mouawad, F.; Badoual, C.; Journe, F.; Saussez, S. HPV Involvement in the Tumor Microenvironment and Immune Treatment in Head and Neck Squamous Cell Carcinomas. Cancers 2020, 12, 1060. [Google Scholar] [CrossRef]
  10. Faden, D.L.; Ding, F.; Lin, Y.; Zhai, S.; Kuo, F.; Chan, T.A.; Morris, L.G.; Ferris, R.L. APOBEC mutagenesis is tightly linked to the immune landscape and immunotherapy biomarkers in head and neck squamous cell carcinoma. Oral Oncol. 2019, 96, 140–147. [Google Scholar] [CrossRef]
  11. Wichmann, G.; Rosolowski, M.; Krohn, K.; Kreuz, M.; Boehm, A.; Reiche, A.; Scharrer, U.; Halama, D.; Bertolini, J.; Bauer, U.; et al. The role of HPV RNA transcription, immune response-related gene expression and disruptive TP53 mutations in diagnostic and prognostic profiling of head and neck cancer. Int. J. Cancer 2015, 137, 2846–2857. [Google Scholar] [CrossRef] [PubMed]
  12. Rickman, D.S.; Millon, R.; De Reynies, A.; Thomas, E.; Wasylyk, C.; Muller, D.; Abecassis, J.; Wasylyk, B. Prediction of future metastasis and molecular characterization of head and neck squamous-cell carcinoma based on transcriptome and genome analysis by microarrays. Oncogene 2008, 27, 6607–6622. [Google Scholar] [CrossRef] [PubMed]
  13. Fountzilas, E.; Markou, K.; Vlachtsis, K.; Nikolaou, A.; Arapantoni-Dadioti, P.; Ntoula, E.; Tassopoulos, G.; Bobos, M.; Konstantinopoulos, P.; Fountzilas, G.; et al. Identification and validation of gene expression models that predict clinical outcome in patients with early-stage laryngeal cancer. Ann. Oncol. 2012, 23, 2146–2153. [Google Scholar] [CrossRef] [PubMed]
  14. Slebos, R.J.; Yi, Y.; Ely, K.; Carter, J.; Evjen, A.; Zhang, X.; Shyr, Y.; Murphy, B.M.; Cmelak, A.J.; Burkey, B.B.; et al. Gene expression differences associated with human papillomavirus status in head and neck squamous cell carcinoma. Clin. Cancer Res. 2006, 12 Pt 1, 701–709. [Google Scholar] [CrossRef]
  15. Chung, C.H.; Ely, K.; McGavran, L.; Varella-Garcia, M.; Parker, J.; Parker, N.; Jarrett, C.; Carter, J.; Murphy, B.A.; Netterville, J.; et al. Increased epidermal growth factor receptor gene copy number is associated with poor prognosis in head and neck squamous cell carcinomas. J. Clin. Oncol. 2006, 24, 4170–4176. [Google Scholar] [CrossRef] [PubMed]
  16. Pyeon, D.; Newton, M.A.; Lambert, P.F.; den Boon, J.A.; Sengupta, S.; Marsit, C.J.; Woodworth, C.D.; Connor, J.P.; Haugen, T.H.; Smith, E.M.; et al. Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head/neck and cervical cancers. Cancer Res. 2007, 67, 4605–4619. [Google Scholar] [CrossRef]
  17. Cohen, E.E.; Zhu, H.; Lingen, M.W.; Martin, L.E.; Kuo, W.L.; Choi, E.A.; Kocherginsky, M.; Parker, J.S.; Chung, C.H.; Rosner, M.R. A feed-forward loop involving protein kinase Calpha and microRNAs regulates tumor cell cycle. Cancer Res. 2009, 69, 65–74. [Google Scholar] [CrossRef] [PubMed]
  18. Keck, M.K.; Zuo, Z.; Khattri, A.; Stricker, T.P.; Brown, C.D.; Imanguli, M.; Rieke, D.; Endhardt, K.; Fang, P.; Brägelmann, J.; et al. Integrative analysis of head and neck cancer identifies two biologically distinct HPV and three non-HPV subtypes. Clin. Cancer Res. 2015, 21, 870–881. [Google Scholar] [CrossRef]
  19. Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 2015, 517, 576–582. [Google Scholar] [CrossRef]
  20. Chakravarthy, A.; Henderson, S.; Thirdborough, S.M.; Ottensmeier, C.H.; Su, X.; Lechner, M.; Feber, A.; Thomas, G.J.; Fenton, T.R. Human Papillomavirus Drives Tumor Development Throughout the Head and Neck: Improved Prognosis Is Associated with an Immune Response Largely Restricted to the Oropharynx. J. Clin. Oncol. 2016, 34, 4132–4141. [Google Scholar] [CrossRef] [PubMed]
  21. Bagaev, A.; Kotlov, N.; Nomie, K.; Svekolkin, V.; Gafurov, A.; Isaeva, O.; Osokin, N.; Kozlov, I.; Frenkel, F.; Gancharova, O.; et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell 2021, 39, 845–865.e7. [Google Scholar] [CrossRef]
  22. Goldman, M.J.; Craft, B.; Hastie, M.; Repečka, K.; McDade, F.; Kamath, A.; Banerjee, A.; Luo, Y.; Rogers, D.; Brooks, A.N.; et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol. 2020, 38, 675–678. [Google Scholar] [CrossRef] [PubMed]
  23. Aran, D.; Sirota, M.; Butte, A.J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 2015, 6, 8971. [Google Scholar] [CrossRef]
  24. Chen, C.; Méndez, E.; Houck, J.; Fan, W.; Lohavanichbutr, P.; Doody, D.; Yueh, B.; Futran, N.D.; Upton, M.; Farwell, D.G.; et al. Gene expression profiling identifies genes predictive of oral squamous cell carcinoma. Cancer Epidemiol. Biomark. Prev. 2008, 17, 2152–2162. [Google Scholar] [CrossRef]
  25. Batista Da Costa, J.; Gibb, E.A.; Bivalacqua, T.J.; Liu, Y.; Oo, H.Z.; Miyamoto, D.T.; Alshalalfa, M.; Davicioni, E.; Wright, J.L.; Dall’era, M.A.; et al. Molecular Characterization of Neuroendocrine-like Bladder Cancer. Clin. Cancer Res. 2019, 25, 3908–3920. [Google Scholar] [CrossRef]
  26. Yi, M.; Nissley, D.V.; McCormick, F.; Stephens, R.M. ssGSEA score-based Ras dependency indexes derived from gene expression data reveal potential Ras addiction mechanisms with possible clinical implications. Sci. Rep. 2020, 10, 10258. [Google Scholar] [CrossRef] [PubMed]
  27. Schubert, M.; Klinger, B.; Klünemann, M.; Sieber, A.; Uhlitz, F.; Sauer, S.; Garnett, M.J.; Blüthgen, N.; Saez-Rodriguez, J. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat. Commun. 2018, 9, 20. [Google Scholar] [CrossRef] [PubMed]
  28. Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
  29. Yasumizu, Y.; Hara, A.; Sakaguchi, S.; Ohkura, N. VIRTUS: A pipeline for comprehensive virus analysis from conventional RNA-seq data. Bioinformatics 2021, 37, 1465–1467. [Google Scholar] [CrossRef] [PubMed]
  30. Bhuvaneshwar, K.; Song, L.; Madhavan, S.; Gusev, Y. viGEN: An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors. Front. Microbiol. 2018, 9, 1172. [Google Scholar] [CrossRef]
  31. Nguyen, N.-P.D.; Deshpande, V.; Luebeck, J.; Mischel, P.S.; Bafna, V. ViFi: Accurate detection of viral integration and mRNA fusion reveals indiscriminate and unregulated transcription in proximal genomic regions in cervical cancer. Nucleic Acids Res. 2018, 46, 3309–3325. [Google Scholar] [CrossRef]
  32. Liberzon, A.; Birger, C.; Thorvaldsdóttir, H.; Ghandi, M.; Mesirov, J.P.; Tamayo, P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015, 1, 417–425. [Google Scholar] [CrossRef] [PubMed]
  33. Burley, M.; Roberts, S.; Parish, J.L. Epigenetic regulation of human papillomavirus transcription in the productive virus life cycle. Semin. Immunopathol. 2020, 42, 159–171. [Google Scholar] [CrossRef] [PubMed]
  34. Cochicho, D.; Esteves, S.; Rito, M.; Silva, F.; Martins, L.; Montalvão, P.; Cunha, M.; Magalhães, M.; Gil da Costa, R.M.; Felix, A. PIK3CA Gene Mutations in HNSCC: Systematic Review and Correlations with HPV Status and Patient Survival. Cancers 2022, 14, 1286. [Google Scholar] [CrossRef] [PubMed]
  35. Faden, D.L.; Thomas, S.; Cantalupo, P.G.; Agrawal, N.; Myers, J.; DeRisi, J. Multi-modality analysis supports APOBEC as a major source of mutations in head and neck squamous cell carcinoma. Oral Oncol. 2017, 74, 8–14. [Google Scholar] [CrossRef]
  36. Faden, D.L.; Kuhs, K.A.L.; Lin, M.; Langenbucher, A.; Pinheiro, M.; Yeager, M.; Cullen, M.; Boland, J.F.; Steinberg, M.; Bass, S.; et al. APOBEC Mutagenesis Is Concordant between Tumor and Viral Genomes in HPV-Positive Head and Neck Squamous Cell Carcinoma. Viruses 2021, 13, 1666. [Google Scholar] [CrossRef] [PubMed]
  37. Koneva, L.A.; Zhang, Y.; Virani, S.; Hall, P.B.; McHugh, J.B.; Chepeha, D.B.; Wolf, G.T.; Carey, T.E.; Rozek, L.S.; Sartor, M.A. HPV Integration in HNSCC Correlates with Survival Outcomes, Immune Response Signatures, and Candidate Drivers. Mol. Cancer Res. 2018, 16, 90–102. [Google Scholar] [CrossRef] [PubMed]
  38. Zhang, Y.; Koneva, L.A.; Virani, S.; Arthur, A.E.; Virani, A.; Hall, P.B.; Warden, C.D.; Carey, T.E.; Chepeha, D.B.; Prince, M.E.; et al. Subtypes of HPV-Positive Head and Neck Cancers Are Associated with HPV Characteristics, Copy Number Alterations, PIK3CA Mutation, and Pathway Signatures. Clin. Cancer Res. 2016, 22, 4735–4745. [Google Scholar] [CrossRef] [PubMed]
  39. Johnson, M.E.; Cantalupo, P.G.; Pipas, J.M. Identification of Head and Neck Cancer Subtypes Based on Human Papillomavirus Presence and E2F-Regulated Gene Expression. mSphere 2018, 3, e00580-17. [Google Scholar] [CrossRef]
Figure 1. Expression-based ML classifier accurately predicts HPV status: (A) Expression-based HPV status predictor training and validation procedure. (B) ROC-AUC for HPV status predictor. (C) Heatmap showing average linkage clustering of the median scaled expression values for the top 25 model genes having the highest feature importance metric for the NCT03238365 cohort (n = 37). (D) Barplot of the most enriched gene sets among the top 16 HPV-high model genes. X-axis shows -Log10 (False Discovery Rate) values for each gene set.
Figure 1. Expression-based ML classifier accurately predicts HPV status: (A) Expression-based HPV status predictor training and validation procedure. (B) ROC-AUC for HPV status predictor. (C) Heatmap showing average linkage clustering of the median scaled expression values for the top 25 model genes having the highest feature importance metric for the NCT03238365 cohort (n = 37). (D) Barplot of the most enriched gene sets among the top 16 HPV-high model genes. X-axis shows -Log10 (False Discovery Rate) values for each gene set.
Viruses 17 00004 g001
Figure 2. A novel tumor microenvironment classification in HPV-positive HNSCC predicts survival outcome: (A) HNSCC expression datasets collection followed by HPV status prediction. Gene expression signature activation scores of HPV-positive samples (N = 266) were then used for dense clustering, which revealed 5 distinct TME clusters. (B) Heatmap showing functional gene expression signature activation scores for a meta-cohort of HPV-positive samples. The X axis represents samples, and the Y axis shows different signatures. (C) Boxplots showing statistically significant differences in signature scores among TME subtypes. ns, * p < 0.05, ** p < 0.01, *** p < 0.001. (D) Overall survival (OS) of patients from the TCGA HNSCC HPV-positive cohort stratified by TME subtype. (E) OS of patients from the HNSCC HPV-positive validation cohort stratified by TME subtype (Supplement Figure S1C).
Figure 2. A novel tumor microenvironment classification in HPV-positive HNSCC predicts survival outcome: (A) HNSCC expression datasets collection followed by HPV status prediction. Gene expression signature activation scores of HPV-positive samples (N = 266) were then used for dense clustering, which revealed 5 distinct TME clusters. (B) Heatmap showing functional gene expression signature activation scores for a meta-cohort of HPV-positive samples. The X axis represents samples, and the Y axis shows different signatures. (C) Boxplots showing statistically significant differences in signature scores among TME subtypes. ns, * p < 0.05, ** p < 0.01, *** p < 0.001. (D) Overall survival (OS) of patients from the TCGA HNSCC HPV-positive cohort stratified by TME subtype. (E) OS of patients from the HNSCC HPV-positive validation cohort stratified by TME subtype (Supplement Figure S1C).
Viruses 17 00004 g002
Figure 3. Four viral subtypes based on HPV transcript expression in HNSC and CESC are associated with survival and viral genome integration: (A) and (C) Heatmaps showing HPV16 transcript expression scores among distinct viral subtypes in TCGA HNSC and TCGA CESC datasets, respectively. The X axis represents samples, and the Y axis shows HPV16 genes. (B) Overall survival of patients from TCGA HNSC cohort stratified by viral subtypes classification. (D) Progression-free survival of patients from the TCGA CESC cohort stratified by viral subtypes’ classification. (E) and (F) Amount of samples with or without detected chimeric viral–host reads per viral subtype among TCGA HNSC HPV-positive and TCGA CESC HPV-positive cohorts, respectively.
Figure 3. Four viral subtypes based on HPV transcript expression in HNSC and CESC are associated with survival and viral genome integration: (A) and (C) Heatmaps showing HPV16 transcript expression scores among distinct viral subtypes in TCGA HNSC and TCGA CESC datasets, respectively. The X axis represents samples, and the Y axis shows HPV16 genes. (B) Overall survival of patients from TCGA HNSC cohort stratified by viral subtypes classification. (D) Progression-free survival of patients from the TCGA CESC cohort stratified by viral subtypes’ classification. (E) and (F) Amount of samples with or without detected chimeric viral–host reads per viral subtype among TCGA HNSC HPV-positive and TCGA CESC HPV-positive cohorts, respectively.
Viruses 17 00004 g003
Figure 4. TME subtypes are associated with viral classification and tumor genetics: (A) Distribution of TME subtypes in each viral subtypes in the TCGA HNSC HPV-positive cohort. The X axis represents samples, and the Y axis shows HPV16 genes. (B) Boxplots showing differences in B- and T-cells’ signature expression scores across all TCGA HNSC samples combined into three groups—E6/E7 subtype (n = 18), other HPV subtypes (n = 35; E2/E5, E1/E4, L1/L2 together) and HPV-negative (n = 388). (C) Oncoprint and (D) alteration rate of driver somatic alterations among different TME and viral subtypes for TCGA HNSCC cohort. (E) Boxplots showing APOBEC genes’ expression signature scores and mutational signature scores in HPV-positive and HPV-negative HNSCCs. (F) Boxplots showing APOBEC genes’ expression signature scores among viral subtypes of HPV-positive and HPV-negative HNSCCs.
Figure 4. TME subtypes are associated with viral classification and tumor genetics: (A) Distribution of TME subtypes in each viral subtypes in the TCGA HNSC HPV-positive cohort. The X axis represents samples, and the Y axis shows HPV16 genes. (B) Boxplots showing differences in B- and T-cells’ signature expression scores across all TCGA HNSC samples combined into three groups—E6/E7 subtype (n = 18), other HPV subtypes (n = 35; E2/E5, E1/E4, L1/L2 together) and HPV-negative (n = 388). (C) Oncoprint and (D) alteration rate of driver somatic alterations among different TME and viral subtypes for TCGA HNSCC cohort. (E) Boxplots showing APOBEC genes’ expression signature scores and mutational signature scores in HPV-positive and HPV-negative HNSCCs. (F) Boxplots showing APOBEC genes’ expression signature scores among viral subtypes of HPV-positive and HPV-negative HNSCCs.
Viruses 17 00004 g004
Table 1. Datasets used for HPV status classifier training and testing.
Table 1. Datasets used for HPV status classifier training and testing.
Cohort IDPlatformN SamplesCitationUsage
GSE65858GPL10558269Wichmann et al., 2015 [11]train
E-TABM-302GPL57073Rickman et al., 2008 [12]train
GSE25727GPL843254Fountzilas et al., 2012 [13]train
GSE3292GPL5701Slebos et al., 2006; Chung et al., 2006 [14,15]train
GSE6791GPL57042Pyeon et al., 2007 [16]train
GSE10300GPL57035Cohen et al., 2009 [17] train
GSE40774GPL13497134Keck et al., 2015 [18] test
TCGA_HNSCRNA-seq498The Cancer Genome Atlas Network,
2015 [19]
test_size—100 samples (~20%) with StratifiedShuffleSplit, the rest are train samples
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nikitina, A.; Kiriy, D.; Tyshevich, A.; Tychinin, D.; Antysheva, Z.; Sobol, A.; Kushnarev, V.; Shin, N.; Brown, J.H.; Lewis, J., Jr.; et al. Viral Transcript and Tumor Immune Microenvironment-Based Transcriptomic Profiling of HPV-Associated Head and Neck Squamous Cell Carcinoma Identifies Subtypes Associated with Prognosis. Viruses 2025, 17, 4. https://doi.org/10.3390/v17010004

AMA Style

Nikitina A, Kiriy D, Tyshevich A, Tychinin D, Antysheva Z, Sobol A, Kushnarev V, Shin N, Brown JH, Lewis J Jr., et al. Viral Transcript and Tumor Immune Microenvironment-Based Transcriptomic Profiling of HPV-Associated Head and Neck Squamous Cell Carcinoma Identifies Subtypes Associated with Prognosis. Viruses. 2025; 17(1):4. https://doi.org/10.3390/v17010004

Chicago/Turabian Style

Nikitina, Anastasiia, Daria Kiriy, Andrey Tyshevich, Dmitry Tychinin, Zoya Antysheva, Anastasya Sobol, Vladimir Kushnarev, Nara Shin, Jessica H. Brown, James Lewis, Jr., and et al. 2025. "Viral Transcript and Tumor Immune Microenvironment-Based Transcriptomic Profiling of HPV-Associated Head and Neck Squamous Cell Carcinoma Identifies Subtypes Associated with Prognosis" Viruses 17, no. 1: 4. https://doi.org/10.3390/v17010004

APA Style

Nikitina, A., Kiriy, D., Tyshevich, A., Tychinin, D., Antysheva, Z., Sobol, A., Kushnarev, V., Shin, N., Brown, J. H., Lewis, J., Jr., Lang Kuhs, K. A., Ferris, R., Wirth, L., Kotlov, N., & Faden, D. L. (2025). Viral Transcript and Tumor Immune Microenvironment-Based Transcriptomic Profiling of HPV-Associated Head and Neck Squamous Cell Carcinoma Identifies Subtypes Associated with Prognosis. Viruses, 17(1), 4. https://doi.org/10.3390/v17010004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop