Building a DenseNet-Based Neural Network with Transformer and MBConv Blocks for Penile Cancer Classification

Lauande, Marcos Gabriel Mendes; Braz Junior, Geraldo; de Almeida, João Dallyson Sousa; Silva, Aristófanes Corrêa; Gil da Costa, Rui Miguel; Teles, Amanda Mara; da Silva, Leandro Lima; Brito, Haissa Oliveira; Vidal, Flávia Castello Branco; do Vale, João Guilherme Araújo; Rodrigues Junior, José Ribamar Durand; Cunha, António

doi:10.3390/app142210536

Open AccessArticle

Building a DenseNet-Based Neural Network with Transformer and MBConv Blocks for Penile Cancer Classification

by

Marcos Gabriel Mendes Lauande

¹,

Geraldo Braz Junior

^1,*

,

João Dallyson Sousa de Almeida

¹

,

Aristófanes Corrêa Silva

¹

,

Rui Miguel Gil da Costa

²

,

Amanda Mara Teles

²,

Leandro Lima da Silva

²,

Haissa Oliveira Brito

²,

Flávia Castello Branco Vidal

²

,

João Guilherme Araújo do Vale

¹,

José Ribamar Durand Rodrigues Junior

¹ and

António Cunha

^3,4

¹

Applied Computing Group (NCA-UFMA), Federal University of Maranhão, São Luís 65080-805, MA, Brazil

²

Postgraduate Program in Adult Health/PPGSAD, Federal University of Maranhão, São Luís 65080-085, MA, Brazil

³

School of Science and Technology, University of Trás-os-Montes e Alto Douro, Quinta de Prados, 5000-801 Vila Real, Portugal

⁴

ALGORITMI Research Centre, University of Minho, 4800-058 Guimarães, Portugal

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10536; https://doi.org/10.3390/app142210536

Submission received: 28 September 2024 / Revised: 7 November 2024 / Accepted: 13 November 2024 / Published: 15 November 2024

(This article belongs to the Special Issue Applied and Innovative Computational Intelligence Systems: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Histopathological analysis is an essential exam for detecting various types of cancer. The process is traditionally time-consuming and laborious. Taking advantage of deep learning models, assisting the pathologist in the diagnosis process is possible. In this work, a study was carried out based on the DenseNet neural network. It consisted of changing its architecture through combinations of Transformer and MBConv blocks to investigate its impact on classifying histopathological images of penile cancer. Due to the limited number of samples in this dataset, pre-training is performed on another larger lung and colon cancer histopathological image dataset. Various combinations of these architectural components were systematically evaluated to compare their performance. The results indicate significant improvements in feature representation, demonstrating the effectiveness of these combined elements resulting in an F1-Score of up to 95.78%. Its diagnostic performance confirms the importance of deep learning techniques in men’s health.

Keywords:

attention mechanism; deep learning; densenet; histopathology; mbconv; penile cancer; transformer

1. Introduction

Penile cancer, although rare, represents a significant challenge due to its incidence in developing regions as it is linked to problems such as poor hygiene, smoking, phimosis, and some types of sexual diseases such as HPV (Human Papillomavirus) [1]. In Brazil, it is more prevalent among men over 50 in the North and Northeast regions. In the State of Maranhão, the incidence rate is alarmingly high compared to other states, with 6.15 cases per 100 thousand inhabitants [2,3]. Much feared by the male population, this type of tumor is associated with physical mutilations that can bring various social and psychological consequences to the individual.

To diagnose penile cancer, histopathological analysis of tissue samples is a fundamental procedure. Tissue sections are stained with Hematoxylin, which colors acidic components blue–purple, and Eosin, which colors basic components pink–red, enhancing contrast and improving the visibility of cellular structures when viewed under the microscope [4]. During histopathological examination, the pathologist assesses malignancy criteria, such as the lesion’s infiltrative or invasive growth patterns, cellular atypia (e.g., high nuclear pleomorphism), and a high number of mitotic cells. Malignant tumors are classified as squamous cell carcinomas based on the identification of these characteristics, including squamous cells with prominent intercellular bridges (desmosomes) and keratin pearl formation, which are visible under the microscope. However, despite being essential, this procedure is complex and time-consuming, relying heavily on the pathologist’s experience [5]. Given the complexity of this analysis, artificial intelligence-based solutions can assist pathologists, increasing diagnostic reliability and enabling early disease detection, which is crucial for effective treatment and improving cure rates.

With advances in machine learning and image processing, some research has focused on analyzing histopathological images for diagnosing various types of cancer [6,7,8,9,10]. About penile cancer, the first work published in the context of deep learning addresses the use of the DenseNet architecture combined with transfer learning to diagnose penile cancer [11]. This study highlights the importance of building a specific dataset to validate the proposed method. It emphasizes the effectiveness of DenseNet in identifying complex patterns in images, demonstrating that transfer learning, preprocessing, and data augmentation are crucial to improving the performance of deep learning models.

In addition to the techniques mentioned above, attention mechanisms have shown promising results, such as a paper that proposed a method based on cascaded convolutional neural networks together with the Soft-Attention mechanism [12]. In the experiments, the Soft-Attention module proved essential for selecting the most significant characteristics, making the model robust and efficient. Also, ref. [13] demonstrated that the implementation of multiple attention mechanisms in DenseNet networks proved effective in improving the classification of histopathological images, highlighting the relevance of these techniques for the accurate diagnosis of penile cancer.

These previous works demonstrate that combining different modules or layers in advanced neural network architectures, attention techniques, and transfer learning can significantly improve the performance of penile cancer classification in histopathological images. Integrating ideas and techniques from previous studies is essential for constructing more robust and practical models for this pathology’s early and reliable diagnosis. With this goal, we developed neural network architectures using convolutional layers (MBConv) and attention mechanisms (Transformer). This design aims to improve the models’ generalization capacity and computational efficiency, especially for smaller datasets [14].

Our work produced hybrid architectures of deep neural networks based on DenseNet-201 [15] with the combination of Transformer and MBConv modules to explore its impact on the classification of histopathological images of penile cancer. Due to limited samples in this dataset, we pre-trained on a larger dataset from the LC25000 [16] dataset with lung and colon cancer images using transfer learning. By evaluating various combinations of these architectural components, we seek to identify the most effective configurations. Experiment results indicate significant improvements in feature representation and classification metrics.

With this work, we make the following contributions to the development of hybrid deep neural network models based on DenseNet-201 for analyzing histopathological images, particularly in the context of the binary classification problem of penile cancer (normal or squamous cell carcinoma):

Development of Hybrid Models: We introduced hybrid architectures based on DenseNet-201 that combine MBConv and Transformer blocks. These models are designed to enhance feature extraction, capturing both local and global patterns within histopathological images;
Evaluation of Block Combinations: We studied the effects of combining MBConv and Transformer blocks within the network, providing insights into how each contributes to performance metrics and feature representation for medical image analysis.

The remainder of this paper is organized as follows: Section 2 presents the literature review with other works that provided a theoretical basis for the research. Section 3 presents the proposed method, the datasets used, and other information. Section 4 details the main results, and the section in which the main conclusions are described closes this paper.

2. Literature Review

Convolutional networks are deep learning architectures widely used in classifying histopathological images due to their performance and efficiency [17]. They use the convolution process, where the network learns feature extractor filters, allowing the capture of spatial patterns at different scales and levels of abstraction.

In this context, ref. [18] proposed a CAD (Computer-Aided Diagnosis) system to detect invasive ductal carcinoma in histopathological images using three convolutional neural network (CNN) models developed from scratch (ConvNet-A, ConvNet-B, and ConvNet-C, with 8, 9, and 19 layers, respectively). Among the architectures, the ConvNet-C model achieved the best accuracy (88.7%) and sensitivity (92.6%) with 100,000 images. Compared to traditional techniques such as SVM, KNN, Random Forest, and Logistic Regression, CNN models performed better, showing their superiority in classifying histopathological images. Ref. [19] used convolutional neural networks (DenseNet-201, ResNet-50, and MobileNet V2) for breast cancer classification. The models with binary and multi-class classifications were trained on the BreakHis [20] and BACH [21] datasets. MobileNet V2 stood out, achieving up to 98% accuracy and up to 92% in multi-class classification, in addition to being faster and more efficient than DenseNet-201 and ResNet-50. Ref. [22] proposed two strategies to improve the early detection of lung and colon cancer using the LC25000 dataset. The first uses pre-trained CNNs, such as VGG, ResNet, and DenseNet, to extract features and increase diagnostic accuracy. The second combines automatic features of the ColonNet model with manual features for a more complete analysis. The approach showed promising results, with an accuracy of 96.31% and sensitivity of 95.67%, demonstrating its diagnostic effectiveness.

Despite their success, convolutional networks face challenges in capturing global dependencies and contextualizing distant regions within images. An alternative approach, which has emerged to address these limitations, is the Vision Transformer (ViT) [23]. This architecture uses a self-attention mechanism, and instead of applying convolutions, it divides the image into patches (smaller blocks) and treats these patches as a sequence. The self-attention mechanism allows the model to capture global dependencies between image patches. In the context of histopathological imaging, ref. [24] investigated a ViT-based model for lung cancer classification. It achieved an accuracy of 98.84% with a patch size of 16 × 16. Based on ViT, other types of similar architectures were built; ref. [25] proposes the construction of an ensemble Swin Transformers (SwinT), which works with non-overlapping displaced windows with classifying subtypes of breast cancer into histopathological images [26]. The model achieved an average accuracy of 99.6% in binary classification and 96.0% in multi-classification, surpassing previous works. Another work compared different variants of the Vision Transformer and their applications in classifying breast cancer images in digital histopathology [27]. The same was compared to more recent models, such as PiT, CvT, CrossFormer, CrossViT, NesT, MaxViT, and SepViT. These models were tested on the BreakHis and IDC [28] datasets, with MaxViT obtaining the best performance, achieving 91.57% accuracy on BreakHis, 91.8% on IDC, and 92.12% when being pre-trained on BreakHis and tuned on IDC.

However, Vision Transformers also have limitations, especially when capturing finer local details [29], which are essential in diagnostic tasks where small cellular patterns can indicate disease. A promising approach to addressing these limitations is to combine the benefits of convolutions with the power of global attention. This allows models to detect pathological structures in complex histopathological images more effectively, addressing limitations such as loss of spatial information and the imprecise focus on extracted features. Following this concept of hybridization of neural networks with convolution and Transformers, one study proposed DACTransNet, a hybrid model for the automatic classification of histopathological images of pancreatic cancer [30]. Furthermore, the model incorporates deformable atrous spatial pyramids to improve results, and with the use of transfer learning, the model achieved up to 96% accuracy. Another study also proposed a hybrid model, but this time for classifying histopathological images of renal cell carcinoma [31]. The Renal Cancer Grading Network model uses an adaptive convolution (AC) block to efficiently extract spatial features and a dynamic attention (DA) block that applies Transformers to refine these representations. In tests with four public databases, the model achieved an accuracy of up to 99.7% on some datasets, demonstrating its robustness and effectiveness.

Finally, these works show how promising the application of hybridization of CNN networks with Transformers for the classification of histopathological images is; the object of this study is to analyze the combinations of MBConv and Transformer blocks in a DenseNet-201 convolutional network in the context of penile cancer.

3. Materials and Methods

The proposed method (Figure 1) is based on the use of the DenseNet-201 network with parts pre-trained on ImageNet and combinations of blocks (Transformer and MBConv) to build neural network architectures that were analyzed for their performance on a dataset related to penile cancer, focusing on a binary classification problem to distinguish between normal tissue and squamous cell carcinoma. Furthermore, we use the transfer learning process [32] to improve results and reduce overfitting [33,34] because the target dataset has few samples. Therefore, the models are previously trained on a dataset of histopathological images of lung and colon cancer and then trained on the target dataset.

3.1. Datasets

We used the LC25000 and PCPAm (Penile Câncer Dataset from Amazônia Legal) datasets to support the experiments. The LC25000 contains 25,000 histopathological images (Figure 2) divided proportionally into five classes (Table 1) with a resolution of 768 × 768 pixels, and such images refer to lung cancer (benign, adenocarcinoma, and squamous cell carcinoma) and colon cancer (benign and adenocarcinoma). This dataset was used to optimize the neural networks before training on the main image base of penile cancer due to the limited number of samples applying the transfer learning process. The main dataset, PCPAm [11], was built based on samples from the State of Maranhão (Brazil) hospitals. It has 194 images (Figure 3) with a resolution of 2048 × 1536 pixels, divided by magnification, and having two classes, normal or cancer (squamous cell carcinoma), as can be seen in Table 2.

3.2. Preprocessing

Used in previous work [11], the CLAHE (Contrast Limited Adaptive Histogram Equalization) algorithm was applied to images to improve lighting distribution after resizing to 224 × 224 pixels and converting from RGB to YUV space. This transformation is necessary because all channels carry color information in the RGB color space, which can cause undesirable effects on the coloring of images after equalizing these channels. To avoid this problem, the CLAHE algorithm was explicitly applied to the Y channel of the YUV [35] color space. The Y channel represents the light intensity information that needs to be enhanced, while the U and V channels carry the color information. Since we used the RGB image as input for the network, we converted it back to this color space after processing with the algorithm. We experimented with CLAHE in various ways, and this approach yielded the best results. After processing, the images result in cellular structures that present a more pronounced contrast, facilitating their distinction in relation to the background, which is crucial for the accurate analysis of histopathological images [11].

3.3. Neural Network Architecture

The neural network architecture developed in this work has four stages (Figure 4). S0 is characterized by the initial convolutions, and S1 with the first dense block and transition layer of the DenseNet-201, pre-trained on ImageNet. The others, S2 and S3, were targeted by block combinations. The resulting versions were compared regarding their performance in classifying medical images.

Dense blocks are derived from DenseNets, which emerged as an effective solution to tackle the vanishing gradient problem in deep neural networks. Each DenseNet comprises several dense blocks containing convolutional operations and direct connections between layers. These connections, similar to those present in ResNets [36] but using concatenation instead of summation, allow knowledge to be shared more efficiently between layers. Furthermore, each dense block is separated by a transition layer composed of a convolution layer and a pooling layer (Figure 5), which reduces the resolution of the generated feature maps, preparing them to be processed through subsequent layers.

MBConv blocks (Figure 6) [37,38], derived from depthwise convolutions, are particularly efficient for capturing spatial interactions. The key feature of these blocks is the “inverted bottleneck” design, which expands the input channel size by four times and subsequently projects this expanded state back to the original channel size, allowing a residual connection. This technique is similar to the FFN (Feed-Forward Network) module found in Transformers, where a fully connected layer expands and reduces the number of channels, facilitating gradient propagation and learning in very deep networks [14]. In addition to the inverted bottleneck design, MBConv blocks in EfficientNet incorporate the Squeeze-and-Excitation (SE) [38,39] mechanism, which is essential for improving the network’s efficiency in capturing relationships interdependent between channels. SE squeezes spatial information globally, condensing the information into a single dimension per channel. Next, an excitation phase occurs, where weights are learned to highlight the most relevant channels selectively, amplifying the most valuable features for the specific task. This process allows the network to adapt to the most essential characteristics dynamically, improving generalization capacity and performance with a minimal increase in computational cost.

Transformer blocks (Figure 7) [23] are structured to capture global relationships between image patches using self-attention mechanisms. Each block comprises four main components: a normalization layer (Layer Norm), a multi-head attention layer, a second normalization layer, and an MLP layer (Multi-Layer Perceptron). The multi-head attention layer allows the model to consider different aspects of spatial relationships when computing similarity weights between patches, generating attention maps that help the network focus on essential areas of the image. Additionally, two residual connections are implemented to ensure that information flows efficiently across layers, helping maintain model stability during training and facilitating gradient propagation in deep networks.

Using these blocks, we developed six neural network architectures. In stages S0 and S1, the architectures consist of the initial layers from a pre-trained DenseNet, which we refer to as “I”. Following this, we have a dense block with a transition layer, denoted as “D”. In the subsequent stages, specifically stages S2 and S3, we employed various combinations of MBConv blocks (denoted as “M”) with squeeze–excitation (SE) and Transformer blocks (denoted as “T”) with solution-based relative attention, where the relative position between patches is used instead of absolute positions to allow for computational efficiency. Additionally, in the second stage, a dense block may be included as part of the configuration. This design enables a flexible arrangement that leverages DenseNet’s feature extraction in the early stages, while later stages benefit from MBConv and Transformer blocks to enhance representational capacity and efficiency [14,40]. Such architectures were trained in two phases, starting with a base of histopathological images of the colon and lung and then being trained and tested on a base of images related to penile cancer, applying the transfer learning technique since the latter has few images.

Finally, six different architectural configurations were implemented and tested. We aim to provide a detailed understanding of the impact of these integrations. These architectures are described below:

I + D + D + M: The second dense block and the transition layer from DenseNet-201 are used in stage S2. The process is completed with two MBConv blocks, operating with 768 channels in stage S3. This network has 8,959,874 learnable parameters.
I + D + D + T: The second dense block and the transition layer from DenseNet-201 are used in stage S2. Stage S3 of the chain utilizes two Transformer blocks with 768 channels. This network has 9,309,714 learnable parameters.
I + D + T + T: Stage S2 includes five Transformer blocks with 384 channels. The process concludes with an additional two Transformer blocks with 768 channels for greater depth in stage S3. This network has 16,219,002 learnable parameters.
I + D + M + T: It includes five MBConv blocks with 384 channels in stage S2. The design is completed with two Transformer blocks with 768 channels in stage S3. This network has 15,577,746 learnable parameters.
I + D + T + M: It includes five Transformer blocks with 384 channels in stage S2. The sequence ends with two MBConv blocks with 768 channels in stage S3. This network has 15,875,306 learnable parameters.
I + D + M + M: It includes five MBConv blocks with 384 channels in stage S2 and concludes with two additional MBConv blocks with 768 channels in stage S3. This network has 15,234,050 learnable parameters.

After the feature extractor stages, all architectures have an average pooling layer, a dense layer of 256 neurons with Relu activation, drop-out, and a Softmax layer for this binary classification task (presence or not of cancer). For the experiments, we compared the performance of the models using accuracy metrics, precision, recall, and F1-Score based on the class that represents the presence of cancer. The F1-Score was selected as the primary criterion for model evaluation because it combines precision and recall, which are both essential indicators for accurately identifying positive samples (with cancer). This metric is particularly well-suited for assessing performance in medical image classification, where accurately detecting positive cases is essential.

3.4. Experiments

The experiments were carried out using a 16 GB NVIDIA RTX 4060 Ti graphics processing unit, and our codes were developed in the Python programming language with support from the PyTorch [41] library (version 2.4.1).

Although the initial layers from DenseNet-201 were pre-trained on ImageNet, multi-class training was carried out on the LC25000 database, including colon and lung cancer images resized to 224 × 224 pixels resolution. The training method adopted was holdout, with 80% of the data for training and 20% for testing, and pre-processing included data augmentation techniques (horizontal, vertical flip, and rotation) on the training images and the application of CLAHE (Contrast Limited Adaptive Histogram Equalization). The Adam algorithm was used to train the neural network. The training hyperparameters, determined empirically after several experimental runs, included a learning rate of 0.00001, 50 epochs, a batch size of 32, and a dropout rate of 0.35.

The second phase of the experiments involved training on the PCPAm penile cancer database with the same pre-processing techniques, images resized to 224 × 224 pixels resolution, using networks previously trained on the LC25000 database, which were then previously adapted to the problem of binary classification, with the feature extraction layers initially frozen, followed by a fine-tuning process with all layers active. The five-fold cross-validation method (k-fold = 5) was used, where for four-fold, 80% was separated for training, 20% for validation, and the other fold was separated for testing. The Adam algorithm was used to train the neural network. The training hyperparameters were empirically determined through several experimental runs. An initial learning rate of 0.0005 was used, followed by a reduced learning rate of 0.00005 during fine-tuning. The training process consisted of 30 initial epochs, followed by an additional 8 epochs for fine-tuning, with a batch size of 32 and a dropout rate of 0.35.

4. Results and Discussion

The results analyzed based on the F1-Score, shown in Table 3 and Table 4, highlighted the performance in classifying histopathological images of penile cancer. Performance metrics such as precision, recall, accuracy, and F1-score report promising results across various architectures. Still, particularly the best results were observed in the I + D + M + M architecture for 40× magnification (experiment 6), which achieved an accuracy of 94.95%, a recall of 98.18%, a precision of 94.05%, and an F1-score of 95.78%. In this case, we can verify that the insertion of MBConv blocks in stages 2 and 3 can yield good results where more layers capture local features. Adding MBConv blocks in stage 3 after a DenseNet block also produced interesting results (experiment 1). Furthermore, adding Transformer blocks in stages 2 and 3 (experiment 3) resulted in a lower F1-Score than other architectures in the conducted experiments, as did adding them only in stage 2 (experiment 5). This suggests that Transformers may be more effectively utilized in the final stage, where they can capture more global features, following stages that include MBConv or DenseNet blocks, as seen in experiments 2 and 4.

For 100× magnification, the I + D + D + M architecture (experiment 1) achieved an accuracy of 93.79%, recall of 100.00%, precision of 90.71%, and F1-score of 94.99%. In this case, the use of MBConv blocks in the final stage, after the DenseNet block, can lead the model to good performance, as well as in stage 2 before the Transformer blocks (experiment 4). Additionally, the results indicate that the F1-Score tends to decrease with the use of Transformers in the final stage when preceded by another DenseNet stage (experiment 2) or other Transformer blocks (experiment 3). Similarly, this metric was low when using MBConv in the final stage if it follows another stage that contains Transformer blocks (experiment 5) or MBConv blocks (experiment 6).

Furthermore, changing the image magnification substantially affects the performance of some architectures. In contrast, others, such as I + D + D + M and I + D + M + T, present a robust F1-Score at both magnifications (40× and 100×), providing multi-scale solutions. The F1-Score values in these configurations show only a small difference, suggesting that they maintain consistent performance regardless of the magnification. This consistency is attributed to the effective combination of MBConv and Transformer block types in DenseNet, which allows for extracting relevant features at different scales.

Additionally, Table 5 compares the proposed approach with other related works that use the same dataset for evaluation. We highlight promising results in this study, which surpass some works [12,13]. One of these works uses a soft attention layer at the end of DenseNet-121 [12], instead of replacing blocks in the network, as we have performed in some experiments. At the same time, another employs DenseNet-201 with an attention mechanism on the skip connections [13]. However, our results indicate that replacing DenseNet blocks with other types may be a viable strategy for improving model outcomes.

We present some qualitative results in Figure 8. Analyzing image classification models with Grad-CAM, Gradient-Class Activation Mapping [42], can provide valuable insights into which regions of images the model considers most important for its decisions. By applying Grad-CAM to the models that performed best in the study—specifically, I + D + M + M for 40× and I + D + D + M for 100× —we observe in Figure 8 that the neural networks assigned significant importance to a considerable portion of regions within cancerous images, particularly areas containing keratin pearls and infiltrative growth. However, in all cancer samples, the tissues are entirely tumorous. For images classified as normal, the neural networks placed greater emphasis on the edges of the tissues, which are well-defined in this type of sample.

5. Conclusions

This research explored integrating Transformer and MBConv blocks into the DenseNet-201 neural network for classifying histopathological images of penile cancer. Our experiments demonstrated that some models trained from the created architectures achieved significant results, particularly in accuracy, precision, recall, and F1-score. Based on the analysis of these results, we identified several architectures with interesting outcomes, such as I + D + D + M (40× and 100×), I + D + M + M (100×), and I + D + M + T (40× and 100×). Furthermore, the first and third architectures mentioned previously proved to be multi-scale solutions, performing well at both magnifications. Using the best architectures for each magnification, we analyzed the GRAD-CAM algorithm to identify the regions of the images that the neural network assigned more importance to, obtaining valuable insights regarding its behavior in classification. Additionally, we compared our approach with other studies, highlighting its competitive performance, which surpassed some other works in image classification. However, some limitations, such as the small dataset size, still need to be addressed in future research. An increase in dataset size could significantly enhance model performance. Nevertheless, obtaining patient samples and producing additional images presents substantial challenges, requiring collaboration with healthcare researchers, hospitals, and ethics committees.

In future work, we will investigate automated optimization methods to identify the best combination and configuration of neural network blocks for specific tasks. This would include exploring different depths and variations of the blocks used, opening the possibility of adopting other types of components to build a specialized neural network architecture. In addition, new image preprocessing techniques can be explored to improve image quality before model training. This is particularly relevant in medical contexts, where image quality can vary significantly. We also plan to explore new labels in the PCPAm dataset, such as tumor grade and the presence of HPV.

Author Contributions

Data curation, R.M.G.d.C., A.M.T., L.L.d.S., H.O.B. and F.C.B.V.; Conceptualization, M.G.M.L., G.B.J. and J.D.S.d.A.; methodology M.G.M.L., J.G.A.d.V., J.R.D.R.J., G.B.J. and J.D.S.d.A.; validation, A.C.S., A.C., J.D.S.d.A. and R.M.G.d.C.; formal analysis, A.C.S., R.M.G.d.C. and G.B.J.; investigation, M.G.M.L. and G.B.J.; writing—original draft preparation, M.G.M.L., G.B.J. and J.D.S.d.A.; writing—review and editing, M.G.M.L., G.B.J., J.G.A.d.V. and J.R.D.R.J.; supervision, G.B.J.; project administration, G.B.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Fundação para a Ciência e Tecnologia, IP (FCT) within the R&D Units Project Scope: UIDB/00319/2020 (ALGORITMI); the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)—Finance Code 001 and CAPES PDPG Amazônia Legal 0810/2020—88881.510244/2020-01; the Fundação de Amparo à Pesquisa e ao Desenvolvimento Científico e Tecnológico do Maranhão (FAPEMA); and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

ACS. Tests for Penile Cancer—American Cancer Society. 2024. Available online: https://www.cancer.org/cancer/penile-cancer/detection-diagnosis-staging/how-diagnosed.html (accessed on 13 June 2024).
Coelho, R.; Pinho, J.; Moreno, J.; Garbis, D.; Nascimento, A.; Larges, J.; Calixto, J.; Ramalho, L.; Silva, A.; Nogueira, L.; et al. Penile cancer in Maranhão, Northeast Brazil: The highest incidence globally? BMC Urol. 2018, 18, 50. [Google Scholar] [CrossRef] [PubMed]
INCA. Types of Cancer|National Cancer Institute—José Alencar Gomes da Silva—INCA. 2024. Available online: https://www.gov.br/inca/pt-br/assuntos/cancer/tipos/penis (accessed on 13 June 2024).
Fischer, A.H.; Jacobson, K.A.; Rose, J.; Zeller, R. Hematoxylin and eosin staining of tissue and cell sections. CSH Protoc. 2008, 2008, pdb.prot4986. [Google Scholar] [CrossRef] [PubMed]
Melo, R.C.N.; Raas, M.W.D.; Palazzi, C.; Neves, V.H.; Malta, K.K.; Silva, T.P. Whole Slide Imaging and Its Applications to Histopathological Studies of Liver Disorders. Front. Med. 2020, 6, 310. [Google Scholar] [CrossRef] [PubMed]
Atabansi, C.C.; Nie, J.; Liu, H.; Song, Q.; Yan, L.; Zhou, X. A survey of Transformer applications for histopathological image analysis: New developments and future directions. BioMedical Eng. OnLine 2023, 22, 96. [Google Scholar] [CrossRef]
de Matos, J.; Ataky, S.T.M.; de Souza Britto, A.; Soares de Oliveira, L.E.; Lameiras Koerich, A. Machine Learning Methods for Histopathological Image Analysis: A Review. Electronics 2021, 10, 562. [Google Scholar] [CrossRef]
Srinidhi, C.L.; Ciga, O.; Martel, A.L. Deep neural network models for computational histopathology: A survey. Med. Image Anal. 2021, 67, 101813. [Google Scholar] [CrossRef]
Zhou, X.; Li, C.; Rahaman, M.M.; Yao, Y.; Ai, S.; Sun, C.; Wang, Q.; Zhang, Y.; Li, M.; Li, X.; et al. A Comprehensive Review for Breast Histopathology Image Analysis Using Classical and Deep Neural Networks. IEEE Access 2020, 8, 90931–90956. [Google Scholar] [CrossRef]
Komura, D.; Ishikawa, S. Machine Learning Methods for Histopathological Image Analysis. Comput. Struct. Biotechnol. J. 2018, 16, 34–42. [Google Scholar] [CrossRef]
Lauande, M.G.M.; Teles, A.M.; Lima da Silva, L.; Matos, C.E.F.; Braz Júnior, G.; Cardoso de Paiva, A.; Sousa de Almeida, J.D.; da Costa Oliveira, R.M.G.; Brito, H.O.; dos Nascimento, A.P.S.A.; et al. Classification of Histopathological Images of Penile Cancer using DenseNet and Transfer Learning. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022)—Volume 4: VISAPP, Virtual, 6–8 February 2022; INSTICC: Lisboa, Portugal; SciTePress: Setúbal, Portugal, 2022; pp. 976–983. [Google Scholar] [CrossRef]
Belfort, F.C.; Silva, I.F.S.d.; Silva, A.C.; Paiva, A.C.d. Detecção de Câncer Peniano em Imagens Histopatológicas usando Redes Neurais Convolucionais em Cascata. In Proceedings of the Anais do XXIII Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2023), Salvador, Brazil, 27 June 2023; Brazilian Computing Society—SBC: Porto Alegre, Brazil, 2023. [Google Scholar] [CrossRef]
Vale, J.G.A.d.; Silva, I.F.S.d.; Matos, C.E.F.; Braz Júnior, G.; Lauande, M.G.M. Redes DenseNet com Mecanismos de Atenção Múltipla aplicadas à Classificação Automática de Câncer Peniano em Imagens Histopatológicas. In Proceedings of the Anais do XXIV Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2024), Goiânia, Brazil, 25–28 June 2024; Brazilian Computing Society—SBC: Porto Alegre, Brazil, 2024. [Google Scholar] [CrossRef]
Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. CoAtNet: Marrying Convolution and Attention for All Data Sizes. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 3965–3977. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
Borkowski, A.A.; Bui, M.M.; Thomas, L.B.; Wilson, C.P.; DeLand, L.A.; Mastorides, S.M. Lung and Colon Cancer Histopathological Image Dataset (LC25000). arXiv 2019, arXiv:1912.12142. [Google Scholar]
Wu, Y.; Cheng, M.; Huang, S.; Pei, Z.; Zuo, Y.; Liu, J.; Yang, K.; Zhu, Q.; Zhang, J.; Hong, H.; et al. Recent Advances of Deep Learning for Computational Histopathology: Principles and Applications. Cancers 2022, 14, 1199. [Google Scholar] [CrossRef]
Gupta, I.; Nayak, S.; Gupta, S.; Singh, S.; Verma, K.D.; Gupta, A.; Prakash, D. A deep learning based approach to detect IDC in histopathology images. Multimed. Tools Appl. 2022, 81, 36309–36330. [Google Scholar] [CrossRef]
Vikranth, C.S.; Jagadeesh, B.; Rakesh, K.; Mohammad, D.; Krishna, S.; Remya Ajai, A.S. Computer Assisted Diagnosis of Breast Cancer Using Histopathology Images and Convolutional Neural Networks. In Proceedings of the 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India, 12–14 February 2022; pp. 1–6. [Google Scholar] [CrossRef]
Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans. Biomed. Eng. 2016, 63, 1455–1462. [Google Scholar] [CrossRef] [PubMed]
Aresta, G.; Araújo, T.; Kwok, S.; Chennamsetty, S.S.; Safwan, M.; Alex, V.; Marami, B.; Prastawa, M.; Chan, M.; Donovan, M.; et al. BACH: Grand challenge on breast cancer histology images. Med. Image Anal. 2019, 56, 122–139. [Google Scholar] [CrossRef]
Iqbal, S.; Qureshi, A.N.; Alhussein, M.; Aurangzeb, K.; Kadry, S. A Novel Heteromorphous Convolutional Neural Network for Automated Assessment of Tumors in Colon and Lung Histopathology Images. Biomimetics 2023, 8, 370. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Kumar, A.; Mehta, R.; Reddy, B.R.; Singh, K.K. Vision Transformer Based Effective Model for Early Detection and Classification of Lung Cancer. SN Comput. Sci. 2024, 5, 839. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
Tummala, S.; Kim, J.; Kadry, S. BreaST-Net: Multi-Class Classification of Breast Cancer from Histopathological Images Using Ensemble of Swin Transformers. Mathematics 2022, 10, 4109. [Google Scholar] [CrossRef]
Sriwastawa, A.; Arul Jothi, J.A. Vision transformer and its variants for image classification in digital breast cancer histopathology: A comparative study. Multimed. Tools Appl. 2024, 83, 39731–39753. [Google Scholar] [CrossRef]
Cruz-Roa, A.; Gilmore, H.; Basavanhally, A.; Feldman, M.; Ganesan, S.; Shih, N.; Tomaszewski, J.; González, F.; Madabhushi, A. Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent. Sci. Rep. 2017, 7, 46450. [Google Scholar] [CrossRef]
Li, Y.; Zhang, K.; Cao, J.; Timofte, R.; Gool, L.V. LocalViT: Bringing Locality to Vision Transformers. arXiv 2021, arXiv:2104.05707. [Google Scholar]
Kou, Y.; Xia, C.; Jiao, Y.; Zhang, D.; Ge, R. DACTransNet: A Hybrid CNN-Transformer Network for Histopathological Image Classification of Pancreatic Cancer. In Proceedings of the Artificial Intelligence: Third CAAI International Conference, CICAI 2023, Fuzhou, China, 22–23 July 2023; Revised Selected Papers, Part II. Springer: Berlin/Heidelberg, Germany, 2024; pp. 422–434. [Google Scholar] [CrossRef]
Mahmood, T.; Wahid, A.; Hong, J.S.; Kim, S.G.; Park, K.R. A novel convolution transformer-based network for histopathology-image classification using adaptive convolution and dynamic attention. Eng. Appl. Artif. Intell. 2024, 135, 108824. [Google Scholar] [CrossRef]
Hosna, A.; Merry, E.; Gyalmo, J.; Alom, Z.; Aung, Z.; Azim, M.A. Transfer learning: A friendly introduction. J. Big Data 2022, 9, 102. [Google Scholar] [CrossRef] [PubMed]
Aitazaz, T.; Tubaishat, A.; Al-Obeidat, F.; Shah, B.; Zia, T.; Tariq, A. Transfer learning for histopathology images: An empirical study. Neural Comput. Appl. 2022, 35, 7963–7974. [Google Scholar] [CrossRef]
Sharmay, Y.; Ehsany, L.; Syed, S.; Brown, D.E. HistoTransfer: Understanding Transfer Learning for Histopathology. In Proceedings of the 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), Athens, Greece, 27–30 July 2021; pp. 1–4. [Google Scholar] [CrossRef]
Villán, A. Mastering OpenCV 4 with Python: A Practical Guide Covering Topics from Image Processing, Augmented Reality to Deep Learning with OpenCV 4 and Python 3.7; Packt Publishing: Birmingham, UK, 2019; p. 185. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; MLResearchPress, 2019; Volume 97, pp. 6105–6114. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Ansel, J.; Yang, E.; He, H.; Gimelshein, N.; Jain, A.; Voznesensky, M.; Bao, B.; Bell, P.; Berard, D.; Burovski, E.; et al. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 27 April–1 May 2024; ASPLOS ’24. Volume 2, pp. 929–947. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]

Figure 1. Proposed method for building models for classifying histopathological images of penile cancer.

Figure 2. Some images from LC2500 dataset by category.

Figure 3. Some images from PCPAm dataset by category and magnification.

Figure 4. Neural network architecture developed in this work.

Figure 5. An example of a Dense block and a transition layer from DenseNet architecture.

Figure 6. An example of an MBConv block.

Figure 7. An example of a Transformer block.

Figure 8. This image shows the analysis result via GRAD-CAM for samples at 40× and 100× magnifications.

Table 1. Distribution of images from LC25000 according to pathological classification.

Category	Images
Lung—benign	5000
Lung—adenocarcinoma	5000
Lung—squamous cell carcinoma	5000
Colon—benign	5000
Colon—adenocarcinoma	5000
Total	25,000

Table 2. Distribution of images from PCPAm according to magnification and pathological classification.

Category/Magnification	40×	100×
Normal	42	42
Cancer (squamous cell carcinoma)	55	55
Total	97	97

Table 3. Table of results with their respective standard deviations of each experiment for magnification 40× by neural network architecture.

Exp.	Network	Magnification	Accuracy	Recall	Precision	F1-Score
1	I * + D * + D * + M	40×	93.95% (7.34)	94.55% (7.27)	94.70% (7.20)	94.54% (6.68)
2	I * + D * + D * + T	40×	92.89% (6.06)	96.36% (4.45)	91.77% (7.19)	93.91% (5.23)
3	I * + D * + T + T	40×	87.58% (4.32)	87.27% (7.27)	90.51% (0.81)	88.73% (4.36)
4	I * + D * + M + T	40×	93.74% (3.95)	98.18% (3.64)	91.77% (4.89)	94.78% (3.27)
5	I * + D * + T + M	40×	90.68% (2.17)	96.36% (4.45)	88.95% (6.04)	92.23% (1.53)
6	*I + D * + M + M**	40×	94.95% (5.48)	98.18% (3.64)	94.05% (8.38)	95.78% (4.39)

* Initial convolutions or blocks of DenseNet-201 pre-trained on ImageNet.

Table 4. Table of results with their respective standard deviations of each experiment for magnification 100× by neural network architecture.

Exp.	Network	Magnification	Accuracy	Recall	Precision	F1-Score
1	*I + D * + D * + M**	100×	93.79% (5.18)	100.00% (0.00)	90.71% (6.88)	94.99% (3.88)
2	I * + D * + D * + T	100×	87.68% (5.09)	90.91% (9.96)	88.19% (4.90)	89.09% (5.17)
3	I * + D * + T + T	100×	89.74% (4.54)	94.55% (7.27)	88.26% (3.57)	91.13% (4.19)
4	I * + D * + M + T	100×	93.79% (2.16)	98.18% (3.64)	91.92% (4.88)	94.77% (1.56)
5	I * + D * + T + M	100×	89.74% (7.19)	92.73% (8.91)	89.62% (6.29)	90.97% (6.56)
6	I * + D * + M + M	100×	89.74% (3.09)	94.55% (4.45)	88.29% (3.56)	91.22% (2.76)

* Initial convolutions or blocks of DenseNet-201 pre-trained on ImageNet.

Table 5. A comparative table of our work with others involves classifying histopathological images of penile cancer using the PCPAm dataset.

Works	Magnification	F1-Score
DenseNet-201 [11]	40×	97.39% (2.13)
DenseNet-201 [11]	100×	97.31% (3.62)
DenseNet-121 + SA [12]	40×	91.2% (2.3)
DenseNet-121 + SA [12]	100×	92.4% (1.5)
DenseUSAG [13]	100×	93.1% (4.6)
Ours (I + D + M + M)	40×	95.78% (4.39)
Ours (I + D + D + M)	100×	94.99% (3.88)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lauande, M.G.M.; Braz Junior, G.; de Almeida, J.D.S.; Silva, A.C.; Gil da Costa, R.M.; Teles, A.M.; da Silva, L.L.; Brito, H.O.; Vidal, F.C.B.; do Vale, J.G.A.; et al. Building a DenseNet-Based Neural Network with Transformer and MBConv Blocks for Penile Cancer Classification. Appl. Sci. 2024, 14, 10536. https://doi.org/10.3390/app142210536

AMA Style

Lauande MGM, Braz Junior G, de Almeida JDS, Silva AC, Gil da Costa RM, Teles AM, da Silva LL, Brito HO, Vidal FCB, do Vale JGA, et al. Building a DenseNet-Based Neural Network with Transformer and MBConv Blocks for Penile Cancer Classification. Applied Sciences. 2024; 14(22):10536. https://doi.org/10.3390/app142210536

Chicago/Turabian Style

Lauande, Marcos Gabriel Mendes, Geraldo Braz Junior, João Dallyson Sousa de Almeida, Aristófanes Corrêa Silva, Rui Miguel Gil da Costa, Amanda Mara Teles, Leandro Lima da Silva, Haissa Oliveira Brito, Flávia Castello Branco Vidal, João Guilherme Araújo do Vale, and et al. 2024. "Building a DenseNet-Based Neural Network with Transformer and MBConv Blocks for Penile Cancer Classification" Applied Sciences 14, no. 22: 10536. https://doi.org/10.3390/app142210536

APA Style

Lauande, M. G. M., Braz Junior, G., de Almeida, J. D. S., Silva, A. C., Gil da Costa, R. M., Teles, A. M., da Silva, L. L., Brito, H. O., Vidal, F. C. B., do Vale, J. G. A., Rodrigues Junior, J. R. D., & Cunha, A. (2024). Building a DenseNet-Based Neural Network with Transformer and MBConv Blocks for Penile Cancer Classification. Applied Sciences, 14(22), 10536. https://doi.org/10.3390/app142210536

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Building a DenseNet-Based Neural Network with Transformer and MBConv Blocks for Penile Cancer Classification

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Datasets

3.2. Preprocessing

3.3. Neural Network Architecture

3.4. Experiments

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI