1. Introduction
Infection by the human papillomavirus (HPV) constitutes the most frequent sexually transmitted infection. Persistent infection by high-risk HPV types (most frequently HPV 16) is linked to the occurrence of anogenital neoplasia, most notably cervical and anal cancer [
1]. The presence of dysplastic lesions of the vagina, despite being less frequent than their counterparts, shares the same carcinogenesis process, and an increase in the detection of precursor lesions has been registered in recent years [
2].
Diagnosing these lesions poses a complex challenge due to their tendency to be multifocal [
3]. Additionally, compared to the cervix, there is a weaker correlation between vaginoscopy patterns and histopathological classification. Moreover, concerns have been raised regarding the suboptimal accuracy and high intra/interobserver variability of vaginoscopy [
4]. The lower prevalence of vaginal cancers and precursor lesions combined with the inexistence of standardized screening approaches constitutes a hurdle for early diagnosis [
5].
Differentiating between vaginal low-grade and high-grade squamous intraepithelial lesions (LSILs and HSILs, respectively) is pivotal to select the most appropriate management and prevent unnecessary treatments [
6]. Indeed, approximately 50% of LSILs have the potential to regress. In contrast, HSILs are true precursor lesions and may progress to invasive squamous carcinoma, which occurs in about 12% of patients in whom an HSIL is identified. Therefore, the suboptimal diagnostic yield of the currently existing diagnostic techniques has prompted research for potential technological solutions [
4].
Recently, there has been significant attention on artificial intelligence (AI). Advances in computer performance have facilitated the successful development of deep learning models, particularly applied in medical fields involving image and video analysis. Convolutional neural networks (CNNs) are a deep learning approach specifically employed to recognize and classify image patterns. Inspired by the neurobiological synapse process, these algorithms replicate it by capturing spatial hierarchies and patterns in an efficient manner [
7].
In the perineal region, most primary AI research has primarily focused on accurately detecting and differentiating HSILs from LSILs in the cervix and anus during colposcopy and anoscopy procedures, respectively [
8,
9,
10,
11,
12,
13,
14,
15,
16]. These advancements have the potential to improve diagnostic accuracy, standardize evaluations, and reduce interobserver variability. However, it is important to note that developing a CNN specifically trained for the cervix or anus does not ensure comparable accuracy when applied to the vagina. Due to the multifocal nature of HPV infection, lesions can also appear, and as of our knowledge, there are no published works addressing the development of this type of AI algorithm in this area of the body.
The aim of this study is to develop and validate a CNN to automatically differentiate vaginal HSILs from LSILs using images retrieved from vaginoscopy exams.
2. Materials and Methods
A retrospective study with a non-interventional design was performed using colposcopies performed at a single tertiary center (Centro Materno Infantil do Norte) in Porto, Portugal, between December 2022 and December 2023 (one year). A total of 57,250 frames of vaginal walls were collected and included in the dataset to develop, train, and validate the CNN-based model. This study was approved by the ethical committee (IRB 2023.157 (131-DEFI/123-CE)), in accordance with the principles indicated in the Declaration of Helsinki.
All colposcopies were performed by a single expert according to the current standard of care and using the same Zeiss colposcope FC 150 (Oberkochen, Germany). Each procedure was separated into four stages, including initial non-stained observation, followed by assessment with 3% acetic acid, and then with lugol. The fourth stage consists of the fact that the procedure may involve the performance of therapeutic manipulation (e.g., laser ablation). Frames from these four stages were included in the dataset. Note that each procedure may comprise any combination of these four stages. Biopsies of suspected lesions were performed, and the collected tissue was preserved. The video segments of the observation of vaginal walls were retrieved, and, using VLC Media Player (VideoLan, Paris, France), the collected videos were segmented in still frames.
A total of 71 vaginoscopy procedures were ultimately included and a dataset of 57,250 frames was assessed. From the collected frames, 25,455 were classified as HSILs, and the remaining 31,795 frames were categorized as LSILs. The biopsy histopathology report from the vaginoscopy evaluation of the observed lesion was consistently referenced to classify each corresponding frame as either an HSIL or LSIL (
Figure 1).
The dataset was split into training/validation (90%) and testing (10%) datasets, corresponding to 51,525 frames and 5725 frames, respectively. The testing set was used to independently validate the performance of the CNN. No patient split was performed to maximize data use. The flowchart in
Figure 2 describes the study’s methodology.
The CNN was created based on a ResNet10 model pre-trained on ImageNet, a vast image dataset for object recognition [
17]. ResNet10 is designed to extract features from input images efficiently, consisting of multiple convolutional layers, batch normalization layers, and residual connections. The model was custom-tailored for our classification task (HSIL vs. LSIL), replacing the original classification layer with a custom classification head containing a fully connected layer followed by an activation function. This new layer mapped the extracted features to the number of classes and converted the scores into log probabilities, assuring numerical stability during training with the negative log-likelihood loss. Hyperparameters including learning rate (0.0001), batch size (128), and epoch number (10), were fine-tuned through trial and error. Data preparation utilized libraries including FFMPEG, Pandas, and Pillow, while the model was implemented in PyTorch. Also, no data augmentation techniques were used. The computational system included dual NVIDIA Quadro RTXTM 80000 GPUs (NVIDIA Corp, Santa Clara, CA, USA) and an Intel 2.1 GHz Xeon Gold 6130 processor (Intel, Santa Clara, CA, USA).
The model assigned a probability to each frame to be classified as either an HSIL or LSIL. Each frame’s final classification was based on the category with the higher probability. The model’s classifications were then compared to the gold standard, the corresponding histopathological diagnosis (
Figure 3).
We performed a 5-fold cross-validation to assess the robustness of the CNN in the training/validation stage. The training/validation dataset, comprising 90% of the total data, was split into five equal-sized folds, using a class-stratified division. We carried out five different iterations in total. In each iteration, four folds were used to train the model, while the fifth fold was used for validation. The particular folds employed for training and validation in the CNN varied throughout each iteration. After each iteration, sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) we determined. Additionally, the area under the conventional receiver operating characteristic curve (AUC-ROC) for each iteration was also calculated. The inputs from each iteration of the training/validation phase allowed for fine-tuning of specific hyperparameters, which were subsequently evaluated using the test dataset. During this stage, the remaining 10% of the data were used to independently assess the CNN’s performance. The statistical analysis was performed using Sci-Kit Learn v0.22.2 [
18].
3. Results
The AI model was trained and tested using a dataset of 57,250 frames (25,455 HSILs and 31,795 LSILs, all confirmed through histopathological analysis). Of this dataset, 90% (57,250 frames) was utilized during the training/validation phase, while the remaining was used during testing phase. The number of frames, patients and lesions (HSILs and LSILs) include in each iteration of the five-fold cross-validation, during the training and testing phase, are detailed in
Table 1.
During the training/validation phase, the analysis of five-fold cross-validation revealed a median sensitivity and specificity of 98.7% (IC95% 96.7–100.0%) and 99.1% (IC95% 98.1–100.0%), respectively. The median PPV was 98.9% (IC95% 97.6–100.0%) and NPV was 98.9% (IC95% 97.4–100.0%). The overall median accuracy was calculated at 98.9% (IC95% 97.9–99.8%). The mean AUROC was 0.990 ± 0.004 (
Figure 4).
Table 2 displays the diagnostic performance metrics of each iteration.
During the test phase, the CNN demonstrated a sensitivity of 99.6% and specificity of 99.7%, achieving an overall accuracy of 99.7%. The PPV and NPV were 99.6% and 99.7%, respectively. The F1 score was 99.6%.
4. Discussion
To our knowledge, this is first proof-of-concept CNN developed for HPV-dysplastic lesion detection and differentiation in the vaginal region. The AI model effectively discriminated HSILs and LSILs within vaginal frames, demonstrating exceptional sensitivity (99.6%) and specificity (99.7%). Such an advancement could contribute to improved colposcopic assessment of female genital tract, increasing the detection rates of clinically important lesions and enhancing the overall cost-effectiveness of the procedure.
High-resolution colposcopic evaluation of the genital tract is a procedure that requires significant expertise. On the one hand, it is important to detect any deviations from the normal mucosal pattern. On the other, it is necessary to distinguish lesions that require treatment from those that can be monitored. This complexity can result in some lesions being overlooked and others being overtreated. Using a colposcope (with high resolution and magnification) facilitates easier and more accurate visualization of the cervical region and also allows for examination of the vaginal and vulvar areas. This comprehensive evaluation of the genital tract is the primary advantage of this device, considering that the effects of HPV can be multifocal, although mastering its use requires significant time and training [
19].
The development and implementation of deep learning models in imaged-based specialties has rapidly accelerated in recent times, and these models may be become increasingly indispensable tools for diagnostic and therapeutic procedures. Current AI models for gynecological assessment using high-resolution colposcopy are still in their early phase and primarily limited to the cervix [
20]. However, considering the complexity of the female genital tract and fundamental data science principles, it is improbable that cervical trained models can accurately detect and differentiate lesions in vaginal walls. While intuitively appealing due to the comprehensive nature of colposcopy, this approach would not be practical.
Therefore, this AI model marks an important milestone, being the first worldwide specific AI model developed for the vaginal area. The rigorous inclusion of only biopsied lesions for training the model brings it closer to the ground truth, mitigating the risk of suboptimal training of the CNN. In addition, the model demonstrated high performance metrics not only for the test set, but also during 5-fold cross-validation of the training set. This is a particularity worth mentioning, as it highlights the model’s consistent efficiency in distinguishing HSIL and LSIL frames, regardless of how frames are distributed. This work is significant not only due to its pioneering nature but also because it highlights the development of a robust AI model that proficiently differentiates HSILs from LSILs, regardless of frame distribution. Compared to the literature that reports a 60% overall accuracy of colposcopy in detecting VAIN lesions, it can be inferred that the development and integration of AI-enhanced colposcopy/vaginoscopy has the potential to substantially improve diagnostic accuracy and positively impact women’s health [
21,
22].
Additional strengths of this study include the incorporation of still frames of the entire procedure, encompassing non-stained, stained and subsequent manipulated frames via biopsy or laser treatment. This comprehensive dataset exposes the model to a diverse range of clinically relevant visual information, including tissue alterations and blood presence. Moreover, the model was also trained on HPV-related lesions situated in different locations on vaginal walls with different angles of visualization. This expanded exposure also enhances the model’s clinical utility by rendering it more adaptable to real-world clinical practice.
It is important to acknowledge some limitations of this methodology. First, we used still frames from procedures performed with colposcopes in the vaginal region, employing a non-annotation labelling methodology. This required the exclusion of complex images containing more than one lesion (e.g., different histologies within the same frame) to prevent model mis-training. Additionally, when evaluating certain vaginal areas, it may be necessary to isolate specific regions using tools (e.g., Kelly forceps) for accurate assessment. As a result, frames containing both the lesion and tool simultaneously had to be excluded, reducing the number of usable frames. This reduction in data could increase the risk of model overfitting, potentially making the model less effective in real clinical settings. Moreover, the absence of patient split between the training and testing sets poses a risk of data leakage, as similar frames may have appeared in both groups, potentially inflating the model’s performance metrics. Despite these challenges, we took steps to mitigate these risks by closely monitoring the model’s performance on the validation and test sets to ensure an unbiased evaluation.
This study is retrospective and was conducted at a single center, with a relatively small amount of data. Consequently, the possibility of demographic bias (selection bias) cannot be excluded, which may further impact the external validity of the findings and suggest that performance accuracy may differ in real-word clinical settings with diverse populations. Moreover, since this model is based on data capture from only one type of device, we cannot guarantee its interoperability, which is currently an essential characteristic for models to be considered true AI. Despite these limitations, technological advancements in healthcare often rely on incremental progress, which should be shared and acknowledged. This study serves as a proof of concept (technological readiness level 3, one step behind the current level of corresponding cervical AI models), demonstrating that AI models for these vaginal regions can indeed be developed with adequate data and rigorous methodological techniques, although prospective and multicentric studies are still needed to validate this.
5. Conclusions
To conclude, this AI algorithm demonstrated great promise in detecting and differentiating low-grade and high-grade squamous intraepithelial lesions (LSILs and HSILs, respectively) in the vagina. This is the first AI model specifically developed and trained for this area. However, further investigation is needed to evaluate broader applicability. While this study is preliminary, retrospective, and conducted at a single center, utilizing still frames, which may affect the generalizability of the results, the model’s performance is a positive first step. Challenges such as potential selection bias and overfitting must be addressed in future investigations to fully assess its generalizability and clinical utility. This model represents the initial step towards a comprehensive, AI-powered colposcopic assessment encompassing the entire female genital tract.
Author Contributions
M.M. (Miguel Mascarenhas): conceptualization, study design, critical revision of the manuscript; I.A. and M.J.C.: study design, data acquisition, critical revision of the manuscript; M.M. (Miguel Martins), P.C., F.M., T.R., M.J.A. and J.M.: bibliographic review, study design, drafting of the manuscript, critical revision of the manuscript; J.F. (Joana Fernanes) and J.F. (João Ferreira): methodology, software, critical revision of the manuscript; T.M. and G.M.: supervision, critical revision of manuscript; R.Z.: study design; critical revision of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Santo António University Hospital IRB 2023.157 (131-DEFI/123-CE).
Informed Consent Statement
Not applicable.
Data Availability Statement
Non-identifiable data will be made available upon reasonable request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Burd, E.M. Human papillomavirus and cervical cancer. Clin. Microbiol. Rev. 2003, 16, 1–17. [Google Scholar] [CrossRef] [PubMed]
- Kurdgelashvili, G.; Dores, G.M.; Srour, S.A.; Chaturvedi, A.K.; Huycke, M.M.; Devesa, S.S. Incidence of potentially human papillomavirus-related neoplasms in the United States, 1978 to 2007. Cancer 2013, 119, 2291–2299. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Hu, D.; Xu, S.; Wang, X.; Chen, Y.; Lv, W.; Xie, X. Clinical features, treatment and outcomes of vaginal intraepithelial neoplasia in a Chinese tertiary centre. Ir. J. Med. Sci. 2016, 185, 111–114. [Google Scholar] [CrossRef] [PubMed]
- Stuebs, F.A.; Koch, M.C.; Mehlhorn, G.; Gass, P.; Schulmeyer, C.E.; Hartman, A.; Strehl, J.; Adler, W.; Beckmann, M.W.; Renner, S.K. Accuracy of colposcopic findings in detecting vaginal intraepithelial neoplasia: A retrospective study. Arch. Gynecol. Obstet. 2020, 301, 769–777. [Google Scholar] [CrossRef]
- Morrison, J.; Baldwin, P.; Hanna, L.; Andreou, A.; Buckley, L.; Durrant, L.; Edey, K.; Faruqi, A.; Fotopoulou, C.; Ganesan, R.; et al. British Gynaecological Cancer Society (BGCS) vulval cancer guidelines: An update on recommendations for practice 2023. Eur. J. Obstet. Gynecol. Reprod. Biol. 2024, 292, 210–238. [Google Scholar] [CrossRef]
- Preti, M.; Joura, E.; Vieira-Baptista, P.; Van Beurden, M.; Bevilacqua, F.; Bleeker, M.C.G.; Bornstein, J.; Carcopino, X.; Chargari, C.; Cruickshank, M.E.; et al. The European Society of Gynaecological Oncology (ESGO), the International Society for the Study of Vulvovaginal Disease (ISSVD), the European College for the Study of Vulval Disease (ECSVD) and the European Federation for Colposcopy (EFC) Consensus Statements on Pre-invasive Vulvar Lesions. J. Low. Genit. Tract. Dis. 2022, 26, 229–244. [Google Scholar] [CrossRef]
- Richards, B.A.; Lillicrap, T.P.; Beaudoin, P.; Bengio, Y.; Bogacz, R.; Christensen, A.; Clopath, C.; Costa, R.P.; de Berker, A.; Ganguli, S.; et al. A deep learning framework for neuroscience. Nat. Neurosci. 2019, 22, 1761–1770. [Google Scholar] [CrossRef]
- Chen, X.; Pu, X.; Chen, Z.; Li, L.; Zhao, K.N.; Liu, H.; Zhu, H. Application of EfficientNet-B0 and GRU-based deep learning on classifying the colposcopy diagnosis of precancerous cervical lesions. Cancer Med. 2023, 12, 8690–8699. [Google Scholar] [CrossRef]
- Fang, S.; Yang, J.; Wang, M.; Liu, C.; Liu, S. An Improved Image Classification Method for Cervical Precancerous Lesions Based on ShuffleNet. Comput. Intell. Neurosci. 2022, 2022, 9675628. [Google Scholar] [CrossRef]
- Mascarenhas, M.; Alencoão, I.; Carinhas, M.J.; Martins, M.; Cardoso, P.; Mendes, F.; Fernandes, J.; Ferreira, J.; Macedo, G.; Zulmira Macedo, R. Artificial Intelligence and Colposcopy: Automatic Identification of Cervical Squamous Cell Carcinoma Precursors. J. Clin. Med. 2024, 13, 3003. [Google Scholar] [CrossRef]
- Miyagi, Y.; Takehara, K.; Miyake, T. Application of deep learning to the classification of uterine cervical squamous epithelial lesion from colposcopy images. Mol. Clin. Oncol. 2019, 11, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Xue, P.; Tang, C.; Li, Q.; Li, Y.; Shen, Y.; Zhao, Y.; Chen, J.; Wu, J.; Li, L.; Wang, W.; et al. Development and validation of an artificial intelligence system for grading colposcopic impressions and guiding biopsies. BMC Med. 2020, 18, 406. [Google Scholar] [CrossRef] [PubMed]
- Yuan, C.; Yao, Y.; Cheng, B.; Cheng, Y.; Li, Y.; Li, Y.; Liu, X.; Cheng, X.; Xie, X.; Wu, J.; et al. The application of deep learning based diagnostic system to cervical squamous intraepithelial lesions recognition in colposcopy images. Sci. Rep. 2020, 10, 11639. [Google Scholar] [CrossRef] [PubMed]
- Saraiva, M.M.; Spindler, L.; Fathallah, N.; Beaussier, H.; Mamma, C.; Quesnée, M.; Ribeiro, T.; Afonso, J.; Carvalho, M.; Moura, R.; et al. Artificial intelligence and high-resolution anoscopy: Automatic identification of anal squamous cell carcinoma precursors using a convolutional neural network. Tech. Coloproctol. 2022, 26, 893–900. [Google Scholar] [CrossRef] [PubMed]
- Saraiva, M.M.; Spindler, L.; Fathallah, N.; Beaussier, H.; Mamma, C.; Quesnée, M.; Ribeiro, T.; Afonso, J.; Carvalho, M.; Moura, R.; et al. Deep Learning in High-Resolution Anoscopy: Assessing the Impact of Staining and Therapeutic Manipulation on Automated Detection of Anal Cancer Precursors. Clin. Transl. Gastroenterol. 2024, 15, e00681. [Google Scholar] [CrossRef]
- Saraiva, M.M.; Spindler, L.; Manzione, T.; Ribeiro, T.; Fathallah, N.; Martins, M.; Cardoso, P.; Mendes, F.; Fernandes, J.; Ferreira, J.; et al. Deep Learning and High-Resolution Anoscopy: Development of an Interoperable Algorithm for the Detection and Differentiation of Anal Squamous Cell Carcinoma Precursors-A Multicentric Study. Cancers 2024, 16, 1909. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. 2011, 12, 2825–2830. [Google Scholar]
- Nout, R.A.; Calaminus, G.; Planchamp, F.; Chargari, C.; Lax, S.; Martelli, H.; McCluggage, W.G.; Morice, P.; Pakiz, M.; Schmid, M.P.; et al. ESTRO/ESGO/SIOPe Guidelines for the management of patients with vaginal cancer. Int. J. Gynecol. Cancer 2023, 33, 1185–1202. [Google Scholar] [CrossRef]
- Brandão, M.; Mendes, F.; Martins, M.; Cardoso, P.; Macedo, G.; Mascarenhas, T.; Mascarenhas Saraiva, M. Revolutionizing Women’s Health: A Comprehensive Review of Artificial Intelligence Advancements in Gynecology. J. Clin. Med. 2024, 13, 1061. [Google Scholar] [CrossRef]
- Monti, E.; Matozzo, C.M.M.; Cetera, G.E.; Di Loreto, E.; Libutti, G.; Boero, V.; Caia, C.; Alberico, D.; Barbara, G. Correlation Between Colposcopic Patterns and Histological Grade of Vaginal Intraepithelial Neoplasia: A Retrospective Cohort Study. Anticancer. Res. 2023, 43, 4637–4642. [Google Scholar] [CrossRef]
- Sopracordevole, F.; Barbero, M.; Clemente, N.; Fallani, M.G.; Cattani, P.; Agarossi, A.; de Piero, G.; Parin, A.; Frega, A.; Boselli, F.; et al. Colposcopic patterns of vaginal intraepithelial neoplasia: A study from the Italian Society of Colposcopy and Cervico-Vaginal Pathology. Eur. J. Cancer Prev. 2018, 27, 152–157. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).