A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism

Thwin, Su Myat; Park, Hyun-Seok; Seo, Soo Hyun

doi:10.3390/app15031067

Open AccessArticle

A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism

by

Su Myat Thwin

¹

,

Hyun-Seok Park

^1,*

and

Soo Hyun Seo

^2,3,*

¹

Department of Computer Science and Engineering, Ewha Woman’s University, Seoul 03760, Republic of Korea

²

Department of Laboratory Medicine, Seoul National University Bundang Hospital, Seongnam 13620, Republic of Korea

³

Department of Laboratory Medicine, Seoul National University College of Medicine, Seoul 03080, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1067; https://doi.org/10.3390/app15031067

Submission received: 26 December 2024 / Revised: 13 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

The early and accurate detection of skin cancer can reduce mortality rates and improve patient outcomes, but requires advanced diagnostics. The integration of artificial intelligence (AI) into healthcare enables the precise and timely detection of skin cancer. However, significant challenges remain including the difficulty in differentiating visually similar skin conditions and the limitations of diverse, representative datasets. In this study, we proposed DCAN-Net, a novel deep-learning framework designed for the early detection of skin cancer. The model leverages an efficient backbone architecture optimized for capturing diverse skin patterns, utilizing carefully tuned parameters to enhance the discrimination capabilities and refine the extracted features using modified attention modules, thereby prioritizing relevant foreground information while minimizing background noise. Furthermore, the Grad-CAM explainable AI method was employed, highlighting the most salient features within dermatoscopic images. The fused optimal feature representations significantly enhanced the dermatoscopic image analysis. When evaluated on the HAM10000 dataset, DCAN-Net achieved a precision, recall, F1-score, and accuracy of 97.00%, 97.57%, 97.10%, and 97.57%, respectively. Moreover, the application of advanced data augmentation techniques mitigated data imbalance issues and reduced false-positive and false-negative rates across the original and augmented datasets. These findings demonstrate the potential of DCAN-Net for improving clinical outcomes and advancing AI-driven skin cancer diagnostics.

Keywords:

artificial intelligence; convolutional neural network; deep learning; traditional machine learning; healthcare; feature extraction; information fusion

1. Introduction

Artificial intelligence (AI) has emerged as a transformative tool in healthcare, offering significant potential for improving outcomes in managing cancer including skin cancer by addressing unregulated cell growth [1]. Skin cancer, the most common cancer worldwide, accounts for over half of all cancer cases, with melanoma being the most dangerous form. In the United States, one in five individuals faces a skin cancer diagnosis by the age of 70, and the risk doubles for those with a history of frequent sunburns. Despite these alarming statistics, early detection significantly improves the survival rates, emphasizing the critical need for effective diagnostic systems [2]. Skin cancer primarily affects the skin, eyes, and nerve centers, leading to tumor formation, with its identification often requiring extensive clinical expertise [3,4]. However, acquiring such expertise is costly and time-consuming, making the development of AI-driven approaches essential for more accessible diagnosis [5,6].

Traditional diagnostic methods, such as clinical examination and advanced imaging techniques including CT, MRI, and ultrasound, are vital in identifying skin cancer, but require skilled professionals for accurate interpretation [7,8]. Self-examinations, while useful, often result in overreactions due to a lack of knowledge. Similarly, the accuracy of clinical diagnoses without technological assistance ranges between 65% and 80%, while dermoscopic images can increase precision but still fall short of delivering the optimal results [9,10,11]. These limitations highlight the need for AI-driven solutions to support clinicians and enhance diagnostic accuracy, especially for diseases such as melanoma, where early detection directly impacts the treatment outcomes and survival rates.

Recent advancements in deep learning and convolutional neural networks (CNNs) have revolutionized the medical field by providing automated, accurate, and cost-effective diagnostic solutions for skin cancer [12,13]. Deep learning models, unlike traditional machine learning, do not require manual feature engineering, making them particularly suited for analyzing complex dermatoscopic images. However, several challenges remain such as the need for large, diverse datasets and the development of robust model architectures capable of addressing domain-specific requirements [14]. Hybrid approaches that combine deep learning with traditional methodologies have shown promise in improving diagnostic precision, particularly in differentiating between benign and malignant lesions [15,16]. These innovations underscore the role of AI in addressing critical gaps in skin cancer detection and treatment.

In this study, we propose the DCAN-Net framework, which integrates a ConvNeXt-based CNN architecture with a customized attention mechanism to enhance diagnostic accuracy. ConvNeXt, a scalable CNN architecture, excels in capturing long-range spatial representations, making it ideal for analyzing dermatoscopic images [17]. The framework leverages explainable AI techniques like Grad-CAM to provide visual justifications for model predictions, fostering transparency and clinician trust. By employing the HAM10000 dataset—a comprehensive collection of dermatoscopic images—we validated the proposed approach, demonstrating its ability to reduce false positives and negatives. This work contributes to the domain of medical image processing by addressing key challenges in early melanoma detection and advancing methodologies for skin cancer diagnosis.

In the proposed DCAN-Net framework, there are several significant steps to enhance skin cancer detection using the HAM10000 dataset.

Incorporation of the ConvNeXt-based model: This provides high performance, confirming an effective cancer pattern followed by contextual understanding.
Modified attention mechanism: Fitted to the focal point on related image regions, improving diagnostic precision and reducing errors.
Explainable AI with Grad-CAM: Provides visual justification of the model’s outcomes, enhancing transparency and clinician trust.
Comprehensive training over the HAM10000 dataset: Utilizes a distinct dataset to improve the generalization of the model to new images.
Optimal feature collection and fusion: Combines the most informative features for specialized dermatoscopic analysis.
Rigorous evaluation: Validates the model’s effectiveness through extensive testing and performance metrics.

The contributions of this study are as follows:

Proposed DCAN-Net framework: This study introduces DCAN-Net, an advanced deep learning framework for skin cancer detection. It incorporates a tailored attention mechanism to focus on critical regions of dermatoscopic images, improving diagnostic accuracy and providing visual attention maps for enhanced explainability.
Integration of ConvNeXt and modified Grad-CAM: ConvNeXt serves as the core feature extraction architecture, offering state-of-the-art efficiency, while a modified Grad-CAM explainable AI enhances interpretability. This combination improves the diagnostic precision and empowers clinicians with insights into the model’s decision-making process.
Parallel feature fusion strategy: An advanced parallel feature fusion strategy amplifies essential feature extraction, enabling accurate skin cancer classification by distilling critical details from dermatoscopic images.
Comprehensive ablation study: Rigorous ablation studies evaluate the impact of hyperparameters and model components, resulting in reduced false positives and negatives and overall improved accuracy.
Evaluation on the HAM10000 dataset: Extensive testing on the HAM10000 dataset including its augmented version was conducted to validate the model’s robust generalization, precision, and potential to transform healthcare diagnostics.
Addressing limitations of traditional CNNs: The study tackles challenges in traditional CNNs including the lack of explainability, overfitting, and sensitivity to data quality. It ensures improved reliability for diverse real-world clinical scenarios.
Enhanced trustworthiness: By incorporating Grad-CAM for visual explanations, robust preprocessing, balanced training, and continuous evaluation, DCAN-Net can establish a transparent, reliable, and clinically applicable system for skin cancer detection.

The remainder of the article is organized as follows. The mainstream approaches are briefly explained in Section 2, and the technical information of the proposed model is provided in Section 3. The empirical results are presented in Section 4, and Section 5 concludes the paper.

2. Related Work

Skin cancer is one of the deadliest diseases worldwide, and its accurate and timely detection is crucial for survival. Researchers have used various computer vision-based approaches to accurately detect early skin cancers. This section describes various attention mechanisms. The methods used for skin cancer detection including conventional, deep learning, and hybrid techniques are then presented.

2.1. Analysis of Existing Attention Mechanisms for Skin Cancer Detection

In skin cancer detection, attention mechanisms play a crucial role in enhancing the performance of CNNs by directing the focus of the model to the most relevant regions of dermatoscopic images.

Self-attention mechanisms: Self-attention mechanisms, such as those employed in transformers, compute the attention scores across all positions in the input image, enabling the model to assess the importance of each pixel relative to every other. This capability allows self-attention mechanisms to capture relationships between distant pixels, which is particularly beneficial for identifying subtle patterns spanning a lesion. Their flexibility allows them to adapt to various image features, thereby enhancing the generalizability of the model. However, self-attention mechanisms are computationally intensive and require a substantial amount of memory, particularly for processing high-resolution images. In addition, their increased complexity makes it more challenging to train and fine-tune models. Although highly effective at capturing intricate patterns, the significant computational overhead limits their practicality for real-time diagnostic systems.

Spatial attention mechanisms: These systems focus on identifying important spatial regions in images. They typically generate a spatial attention map that highlights the areas of interest, which helps the model focus on critical regions such as the boundaries and texture variations of the lesions. They are generally less computationally demanding than self-attention mechanisms, however, they may not capture the long-range dependencies as effectively as the self-attention mechanisms. There is a risk that the model may overemphasize certain regions while neglecting others that may also be relevant. They are particularly useful for enhancing the focus of the model on lesion characteristics and improving diagnostic accuracy.

Channel attention mechanisms: These systems focus on the importance of different feature channels, allowing the model to weigh the channels that carry more relevant information. They enhance important features (e.g., color and texture) that are indicative of malignancy and can be easily integrated into existing CNN architectures. Although this is effective in emphasizing certain features, it may not adequately address the spatial localization. These systems can improve the sensitivity of the model to specific diagnostic features, thereby aiding in differentiating between benign and malignant lesions.

Dual attention mechanisms: This type of mechanism combines both spatial and channel attention, enabling the model to simultaneously focus on significant regions and features. It balances spatial and feature importance, providing a holistic approach to image analysis. This leads to an improved overall performance by leveraging the strengths of both spatial and channel attention. Combining these two attention mechanisms can increase the complexity and computational requirements of the model. This mechanism offers a balanced approach, making it highly effective for the detailed and accurate detection of skin cancer.

Hierarchical attention mechanisms: These systems apply attention at multiple levels of the network, progressively refining the focus. They gradually narrow the attention to the most critical features and regions through multiple layers. This facilitates a detailed and nuanced image analysis, and these systems are complex and challenging to train effectively, requiring additional computational resources. It is beneficial to progressively focus on the relevant details, leading to a more precise diagnosis.

Each attention mechanism offers distinct advantages and has specific limitations in skin cancer detection. In the proposed DCAN-Net framework, a systematic evaluation of these mechanisms ensures the selection of the most effective approach, thereby enhancing the diagnostic accuracy and reliability of the model for skin cancer detection. A thorough analysis was conducted to ensure that the selected attention mechanism significantly contributed to the performance of the developed model, making it a robust and trustworthy tool for early skin cancer detection.

2.2. Conventional Machine Learning Methods

In the early era of artificial intelligence, researchers employed conventional machine learning methods due to a lack of data availability and computational resources. For instance, Taufiq et al. [18] presented a real-time mobile-enabled healthcare system called the m-skin doctor for skin cancer detection. The system uses computer vision and image processing techniques including Gaussian filter noise removal, grab-cut algorithm segmentation, and a support vector machine (SVM) for classification. The system achieved 80% sensitivity and 75% specificity; however, its limited sensitivity and specificity rates restrict its real-world implementation. Lakshmi and Kanchana [19] presented an adaptive machine with an ensemble SVM skin cancer classification. This ensemble model achieved a higher performance with reduced inference speed and cost. Their model was evaluated using the PAD-UFFS-20 dataset. Jaisakthi [20] proposed an automated skin lesion segmentation approach for dermatologists that used dermoscopic imaging to detect and classify skin lesions. This approach involves two steps: preprocessing and segmentation. In the preprocessing step, unnecessary detail noise is removed, while the GrabCut algorithm is used for segmentation.

The k-means clustering algorithm and color features from the training images improve the boundaries of the segments. This method enhances the diagnostic capability of dermatologists and aids in the early detection and classification of skin cancers. Masood et al. [21] proposed a deep-learning-based system for the early detection of melanoma lesions. The system uses clinical images, preprocessed to reduce illumination and noise effects, and a pre-trained CNN to distinguish between melanoma and benign cases. The experimental results of the developed system outperformed state-of-the-art methods in diagnostic accuracy, making it a valuable tool for dermatologists in the early diagnosis of melanoma. Initially, conventional methods were considered milestones in computer-vision-based methods for skin cancer detection. However, conventional methods require handcrafted feature engineering, exhibit limited performance, have high false rates, and require domain experts.

2.3. Deep Learning Methods

Deep CNNs show potential for skin cancer classification and overcome the limitations of conventional machine learning methods. For instance, Esteva et al. [22] presented a CNN-based model that was trained on 129,450 clinical images and achieved a promising performance comparable to dermatologists in two binary classification use cases. This artificial intelligence could extend the dermatologists’ reach beyond clinics, as 6.3 billion smartphone subscriptions are projected to exist by 2021. To differentiate between melanoma and benign cases, Nasr-Esfahani et al. [23] employed clinical image-based methods that were preprocessed to reduce light and noise effects. Their method performed better than the other existing approaches in terms of diagnostic accuracy, making it a helpful tool for dermatologists in the early diagnosis of melanoma. For instance, a deep belief network and self-advised SVM were utilized in a self-supervised model for melanoma diagnosis. To increase generalization and decrease redundancy, a bootstrapping technique was adopted. Studies revealed that this approach performed better than other systems including KNN and SVM. For melanoma identification, a straightforward CNN can be used to preprocess images to remove noise. Ghanshala et al. [24] suggested that a good visual examination model is essential for correctly classifying and detecting diseases. The creation and testing of algorithms have enhanced image-order systems with the availability of multiple open image datasets. In a recently developed database for skin cancer, CNNs outperformed other established AI technologies. This study analyzed skin cancer images from the International Skin Imaging Collaboration (ISIC) dataset to classify the images into harmful or compassionate classifications using a prebuilt CNN. The highest accuracy was achieved using 128 × 128 images (83.78% accuracy). For accurate skin lesion classification, Nazia et al. [25] proposed a real-time mobile-enabled skin lesion classification expert system, “i-Rash”, using CNN and cloud-server architecture. The system aims to diagnose acne, eczema, and psoriasis in remote locations. The model used SqueezeNet, a transfer learning approach, and was trained on 1856 images. The system achieved an accuracy, sensitivity, and specificity of 97.21%, 94.42%, and 98.14%, respectively. Subramanian et al. [26] employed a 16-layer CCN model to detect and classify cancers based on clinical images and obtained promising results, maintaining the false-negative rates below 10% and achieving a precision above 80%. Malo et al. [27] proposed the VGG-16-based model for effective skin cancer detection. They utilized the ISIC dataset with 2460 colored images and obtained an accuracy of 87.6%, demonstrating the significant outcome of CNN in skin cancer detection.

Nampalle and Raman [28] proposed a computer-aided diagnosis (CAD) system to categorize medical images by adopting a transfer learning strategy in a MobileNet pretrained model using the ISIC dataset. Their model achieved promising performance compared with other existing methods. Li and Seo [29] built a Mask R-CNN model for lesion segmentation using pre-trained weights from the Microsoft COCO dataset. Experiments on benchmark ISIC 2018 datasets demonstrated 96% accuracy in lesion boundary segmentation and 80% balanced multiclass accuracy. Agrahari et al. [30] also evaluated the system, as skin cancer is a major health concern with 125,000 new melanoma cases annually. Early detection can decrease mortality rates. A multiclass skin cancer detection system using a pretrained MobileNet model and the HAM10000 ISIC dataset achieved high categorical accuracies of 80.81%, 91.25%, and 96.26%, respectively, making it a fast and expensive method for clinical advancement.

2.4. Hybrid Learning Methods

Deep CNNs have shown potential in skin cancer classification. Inspired by the recent performance of deep-learning-based approaches, researchers are now working on developing hybrid methods, in which an attention mechanism is developed and integrated with numerous deep-learning models for various purposes including image classification [31] and machine translation [32]. This attention module has been used as an advanced approach to acquire long-range feature interactions and increase the CNN feature representation capabilities [33]. This attention mechanism plays a significant role in human visual perception because it can allocate available resources to selectively focus on processing the salient part instead of the entire scene [34,35]. Skin cancer is a global issue, and dermoscopy-based classification is an effective method for diagnosing skin lesions.

However, challenges such as interclass similarity, intraclass variation, high-class imbalance, and lack of focus on lesion areas affect the classification results. To address these issues, Qian et al. [36] proposed a deep CNN dermatoscopic image classification method. Their method extracts multiscale fine-grained features and uses class-specific loss weighting to address category imbalance. The ACC and AUC models were evaluated on the HAM10000 dataset, achieving an accuracy and AUC of 91.6% and 97.1%, respectively, demonstrating their potential for dermatoscopic classification tasks. Singh et al. [37] presented a deep learning model using VGG19 and self-attention for skin cancer classification and obtained improved performance using the HAM10000 dataset. However, its accuracy requires further improvement. Castro-Fernández et al. [38] proposed an approach to improve the performance of an existing lightweight MobileNetV2 model for skin cancer monitoring that included an attention mechanism to enhance the model learning capability. A fine-tuning strategy was adopted including pretraining weights with an autoencoder using the AM10000 dataset. Their approach achieved an ideal performance with a condensed MobileNetV2 model and a coordinate attention mechanism, achieving an accuracy of 83.93%. Methods based on conventional machine learning, deep learning, and hybrid models have achieved promising results; however, studies in this field require more attention to increase the model performance for early and effective skin cancer detection to save human lives.

3. Proposed DCAN-Net

This section consists of two fundamental steps that are pivotal to our research methodology: preprocessing, and proposed model selection and testing. Preprocessing was used to prepare and enhance the dataset to ensure the quality and suitability for subsequent analyses. Following the preprocessing phase, we delve into the core of our research, where we carefully chose and evaluated the efficacy of the proposed model. This step involves rigorous testing, validation, and assessment, as described in the following section.

3.1. Preprocessing

Data preprocessing is an important step in training machine learning approaches to enhance the quality of the input data. The classification accuracy can decrease when training a CNN model using raw input data such as images. To mitigate this problem, data augmentation was performed during the preprocessing phase. This involves the generation of new images derived from the original images, introducing variations in aspects such as brightness adjustments, contrast, orientation position, and scale, as shown in Figure 1. For skin cancer, we employed a dataset augmentation approach similar to that previously used by Aladhadh et al. [39], as shown in Figure 1. For contrast enhancement, we employed adaptive histogram equalization (CLAHE) with a clip limit of 2.0 and a tile grid size of 8 x 8 to evenly distribute intensity values across the image, enhancing visibility while avoiding noise amplification. Brightness adjustment involved linearly scaling the pixel intensity values within a range of −20% to +20% of the original brightness, enabling better model robustness under varying lighting conditions. For geometric transformations, we applied rotations of up to ±10 degrees, along with horizontal and vertical flipping. These transformations were chosen to simulate natural variations in dermatoscopic images and improve the generalization of the proposed model. We will ensure that these details are clearly outlined in the revised manuscript to enhance clarity and precision. A two-step approach was used, preprocessing and training a CNN model, and the results were compared with other benchmarks with and without data augmentation. Our strategy is similar to that of Aladhadh et al. [39], in that it provides augmentation using brightness adjustment, contrast enhancement, and geometric transformations.

Brightness adjustment: Various lighting conditions can lead to fluctuations in the image brightness, and the application of gamma correction transformations with different values, as given in Equation (1), can result in images that are either too bright or too dark. Therefore, the chosen images may exhibit variations caused by differences in illumination. Brightness adjustment transformations allowed us to address this illumination-induced variation.

g (x, y) = a \times f (x, y)

(1)

Contrast enhancement: This process mitigates the effect of contrast variations in images resulting from changing lighting conditions. The contrast stretching procedure described in Equation (2) was applied to modify the contrast variations using various factors, as illustrated in Figure 1.

g (x, y) = \{\begin{matrix} a_{1} f (x, y) f (x, y) < r_{1} a_{1} f (x, y) f (x, y) < r_{1} \\ a_{2} (f (x, y) - r_{1}) + s_{1}, r_{1} \geq f (x, y) < r_{2} a_{2} (f (x, y) - r_{1}) + s_{1}, r_{1} \geq f (x, y) < r_{2} \\ a_{3} (f (x, y) - r_{2}) + s_{2}, f (x, y) \geq r_{2} a_{3} (f (x, y) - r_{2}) + s_{2}, f (x, y) \geq r_{2} \end{matrix}\}

(2)

where g(x, y) is the output pixels derived from the g(x, y) input pixel;

s_{1}

,

s_{2}

,

r_{1}

, and

r_{2}

, are the parameters used for contrast adjustment;

a_{1}

,

a_{2}

, and

a_{3}

are the scale factors for the rotations in grayscale image and formulated as

s_{1}

/

r_{1}

, (

s_{2}

−

s_{1}

)/(

r_{2}

−

r_{1}

), and (L −

s_{2}

)/(L −

r_{2}

), respectively; L is the maximum range of the gray level value.

Geometric transformations: This process encompasses scaling, translation, and rotation operations, which are used for each image within the dataset, resulting in the creation of new images. In the context of the CNN architecture, this process holds significant value because it enables the model to perceive the same object from various angles, bolstering its generalization capability. Therefore, all preprocessing steps are fundamental for performance enhancement.

3.2. Proposed Model Selection

The proposed model incorporates key modifications to the ConvNeXt architecture [40], enhancing its capability to detect skin cancer in dermatoscopic images. A significant improvement involves integrating a modified channel attention mechanism inspired by the dual-fire attention network (DFAN) [41]. Unlike the original ConvNeXt architecture, which primarily focuses on extracting features using standard convolutional layers, the proposed model replaces the conventional 7 × 7 convolution layer with a 5 × 5 separable convolution. This substitution reduces the computational overhead while maintaining the feature extraction efficiency. Additionally, the spatial attention module was fused with the intermediate outputs of the channel attention module via element-wise product operations. This dual-attention mechanism, illustrated in Figure 2, improved the model’s ability to localize and emphasize critical regions within the input images, enabling more precise detection of cancerous lesions.

The modified architecture diverges from the original ConvNeXt by employing stacked dual-attention modules after every two successive convolutional layers, a novel design that optimizes feature representation. While the base ConvNeXt architecture relies on depthwise and 1 × 1 convolutions with GELU activation and layer normalization [42], the proposed model integrates channel attention to weigh the importance of RGB channels in the feature maps (denoted as

{A t t}_{C}

) as well as spatial attention to focus on the most salient image regions. This synergistic attention fusion offers a unique advantage over the standard ConvNeXt by refining the model’s sensitivity to the subtle visual cues in dermatoscopic images. Moreover, the patchy stem cell design of ConvNeXt is preserved, ensuring efficient multi-scale feature extraction while the attention mechanisms enhance the representation capabilities, particularly for medical image analysis.

The two-phase (offline and online) framework introduced further differentiation from the original ConvNeXt. In the offline phase, preprocessing steps, such as augmentation with variations in brightness, contrast, and orientation, are applied to improve the model’s generalizability on the HAM10000 dataset. Grad-CAM is utilized to identify critical regions in the training images, with these features fused into the attention mechanism for improved specialization in dermatoscopic analysis. In the online phase, DCAN-Net leverages the pre-trained dual-attention-enhanced model for real-time predictions, combining accuracy with explainability by using Grad-CAM visualizations. This ensures that clinicians can trust the system’s diagnostic outputs, a feature absent in the original ConvNeXt. Overall, the proposed design achieves a balance between high diagnostic precision and computational efficiency, making it highly suitable for resource-constrained medical environments.

To capture the improved feature maps F_(M^’), we combined the spatial attention feature maps

{A t t}_{S}

with the input feature maps F_M using a residual skip connection, accomplished through elementwise addition. This approach notably increased the feature representation capability, allowing for a more accurate recognition of essential regions. Mathematically, we can respectively express the feature maps for channel attention, spatial attention, and refined attention as follows:

{A t t}_{C}^{W x H x C} = A_{C} (F_{M}^{W x H x C}) \otimes F_{M}^{W x H x C},

(3)

{A t t}_{S}^{W x H x C} = A_{S} ({A t t}_{C}^{W x H x C}) \otimes {A t t}_{C}^{W x H x C},

(4)

F_{C}^{W x H x C} = {A t t}_{S}^{W x H x C} \otimes F_{M}^{W x H x C},

(5)

In the equations above (Equations (3)–(5)), H, W, and C denote the dimensions of the feature maps, representing the height, width, and number of channels, respectively. A_C and A_CS refer to the intermediate channel and spatial attention module, respectively. We acquired the refined feature maps, labeled F_(M^’), by integrating the spatial attention feature maps (A_S) with the input feature maps (F_M), as shown in Figure 3. This strategy enabled us to increase the feature representation capability and improve the overall performance of the skin cancer model.

ConvNeXt was selected for skin cancer detection due to its outstanding performance, efficiency, scalability, and proven reliability, making it an ideal choice for this medical application. It consistently achieved state-of-the-art results in complex image classification tasks, demonstrating its ability to extract intricate features and discern patterns in dermatoscopic images, which is crucial for identifying subtle visual cues indicative of skin cancer. ConvNeXt’s efficiency ensures a low computational overhead while maintaining high accuracy, essential for real-time diagnosis and processing large datasets in healthcare environments. Its extensive validation within the research community underscores its robustness and reliability, while its generalizability across diverse datasets and tasks is advantageous for handling the inherent variability in dermatoscopic images. Despite the suitability of ConvNeXt, a novel model was developed to address the unique demands of skin cancer detection by incorporating a dual-attention mechanism to improve the feature localization and precision. This modification enhances the model’s focus on relevant regions within images, further optimizing its performance for skin cancer diagnosis and ensuring that it meets both the accuracy and interpretability requirements for clinical applications.

4. Results and Discussion

This section briefly presents a comprehensive evaluation of the proposed model for skin cancer detection using the HAM10000 dataset. The effectiveness of the model and its potential to revolutionize early skin cancer diagnosis were assessed using various performance metrics and comparative analyses. In this context, we first explain the experimental settings, evaluation metrics, datasets, and producible results and compare them with state-of-the-art approaches.

4.1. Experimental Setting

This section describes the experimental setup and hyperparameter selection. In this context, the training procedure was carefully fine-tuned using the Adam optimizer, incorporating a progressive learning rate schedule of 0.001, suitable batch sizes of 32, and the careful selection of 50 epochs to guarantee convergence while avoiding overfitting. To implement the proposed model, we used a high-performance NVIDIA GeForce RTX 3090 GPU, enabling the execution of efficient training processes. In the implementation and experimentation, we used a prominent deep-learning framework, TensorFlow-banked kernels. The experimental settings employed in this study were carefully planned to support a rigorous evaluation of the ConvNeXt-based model coupled with its potential transformative influence on early skin cancer diagnosis. The main objective of the proposed model is to detect the affected skin area at an early stage. To this end, an automatic intelligent model is highly desirable to assist the dermatology department and patients before the cancer reaches its final level. Therefore, a deep learning technique was developed that learns significant features from diverse skin cancer diseases and identifies the types of illnesses. To tune the pretrained model, we removed the upper layer and updated the remaining layer parameters by adjusting our skin dataset. In the initial epoch, the model attempted to learn new patterns and update randomly assigned weights to minimize errors. Next, through backpropagation, the model randomly updated the weight values and quickly obtained the local minima using various filtering strategies and fusion over variant positions.

4.2. Dataset Description

The HAM10000 dataset comprises 10,015 dermatoscopic images collected over 20 years from two distinct locations: the Department of Dermatology at the Medical University of Vienna, Austria, and the skin cancer clinic of Cliff Rosendahl in Queensland, Australia. The HAM10000 dataset is widely regarded as a benchmark dataset in the field of skin cancer detection and analysis. It consists of 10,015 dermatoscopic images representing seven different types of skin lesions including melanoma, basal cell carcinoma, and benign lesions, making it a comprehensive and diverse resource for training and evaluating machine learning models. Its broad acceptance within the research community stems from the high-quality annotations provided by experienced dermatologists and its representation of a wide spectrum of real-world skin conditions. Furthermore, the dataset’s size, variety, and availability have enabled researchers to develop and compare innovative methods for skin lesion classification and segmentation effectively. While there are other datasets in the domain, such as the ISIC Challenge datasets, the HAM10000 dataset is considered a reliable standard for benchmarking and is particularly well-suited for studies like ours that focus on improving the early detection and classification of skin cancer. The dataset consists of seven different classes: actinic keratoses (Akiec), basal cell carcinoma (Bcc), benign keratosis-like lesions (Bkl), dermatofibroma (Df), melanoma (Mel), melanocytic nevi (Nevi), and vascular lesions (Vasc). It is a highly imbalanced and challenging dataset. Various preprocessing steps were used to improve visibility and increase the number of samples in the dataset to enhance the generalization capability of the model. Furthermore, these preprocessing steps reduced the data imbalance.

For preprocessing, only the Akiec, Bcc, Df, and Vasc classes were considered because the training samples were very limited, as shown in Table 1. Consequently, 70% of the data were used to train the model, 20% for validation, and 10% for testing. Figure 4 shows sample images taken from the HAM10000 dataset.

4.3. Model Performance Metrics

This subsection focuses on the performance evaluation metrics to evaluate the effectiveness and robustness of the proposed model. To evaluate this capability, we used an extensive range of performance indicators such as accuracy, precision, recall, and F1-measure.

Accuracy assessment is a critical parameter for evaluating the overall efficacy of the skin cancer detection models. This metric quantifies the ratio of accurately classified skin cancer instances to the total number of instances included in the dataset. Mathematically, the accuracy is determined by the following calculation:

A c c u r a c y = \frac{C o r r e c t p r e d i c t i o n s}{T o t a l n u m b e r o f p r e d i c t i o n s}

(6)

A high accuracy score indicates that the model detected skin cancer and noncancer instances well, but other measures were required for imbalanced datasets.

Precision is a metric that quantifies the proportion of correctly identified positive cases relative to the overall number of positive predictions generated by the skin cancer detection model. In the context of our study, precision plays a crucial role in evaluating the capacity of the model to accurately detect instances of skin cancer while minimizing the occurrence of false-positive classifications for noncancer cases. Precision was computed using the following formula:

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}

(7)

Recall, sometimes referred to as the sensitivity or the true positive rate, measures the capacity of the model to accurately identify and include all skin cancer occurrences in a dataset. This measures the ability of a model to correctly identify all positive instances out of the total number of positive instances. The concept of recall is commonly referred to as:

R e c a l l (S e n s i t i v i t y) = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(8)

A higher recall indicates that the model has a lower rate of missing positive instances, meaning that it is better at capturing all instances of the positive class, which is crucial in medical diagnosis to avoid missing potential cases of disease.

The F1-score combines precision and recall, offering a consolidated measure for evaluating the overall performance of a model. The harmonic means of the precision and recall were computed using the following formula:

F 1 - S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(9)

When the positive and negative classes are imbalanced, the F1-score helps balance the false positives and negatives in the datasets.

4.4. Results Evaluation with and Without Preprocessing

This subsection describes the utilization of a pretrained ConvNeXt model coupled with spatial and channel attention mechanisms to classify skin cancer. The proposed model was trained using 30 epochs on the target dataset, before and after preprocessing. The results presented in Table 2 demonstrate the effectiveness of the proposed model using the original dataset without preprocessing. Based on the data given in Table 2, the proposed model obtained a lower performance for the Mel class, whereas the Vasc and Df classes had a higher performance and achieved an average accuracy of 91.85%.

Figure 5 presents the accuracy of the proposed model without preprocessing, showing the progression of the training and validation accuracy over time. The x-axis represents the number of epochs, and the y-axis represents the accuracy and loss values. In this visualization, the blue lines represent the training accuracy and loss, whereas the orange lines represent the validation accuracy and loss. Initially, the validation accuracy was higher than the training accuracy; however, this trend reversed after a few epochs. Specifically, after eight epochs, the training accuracy exceeded the validation accuracy. Our proposed model achieved a training accuracy of 96% and a validation accuracy of 92%. Throughout the experiment, the validation loss remained consistently higher than the training loss, reflecting the combined losses in both the training and validation phases.

The classification results reported in Table 3 indicate the performance of the proposed model in terms of precision, recall, and F1-score to further validate the model performance over the original dataset without any preprocessing steps.

Figure 6 illustrates the training and validation accuracies obtained after applying the preprocessing techniques. The initial training and validation accuracies were 55% and 75%, respectively. As training progressed through each epoch, both the training and validation accuracies steadily improved. By the final epoch, the training accuracy reached 98.50% and the validation accuracy reached 98%. Additionally, Figure 6 showed a significant reduction in both the training and validation losses over the course of the epochs, indicating improved model performance. According to the evaluation results, the system with preprocessing steps could significantly reduce the training and validation losses compared to the system without preprocessing steps.

Table 4 represents the performance of the proposed model, which demonstrates the advantages of the data preprocessing steps. It is worth noting that the performance of our skin cancer classification model showed substantial improvement following preprocessing. Table 4 provides a comprehensive overview of the performance of the proposed model, incorporating the effects of preprocessing through a detailed confusion matrix that indicates the correct and incorrect predicted values for each class. Upon examining Table 4, it becomes apparent that the BCC class exhibited the lowest performance, whereas the Vasc and Df classes demonstrated the highest performance. This pattern underscores the varying degrees of success in classifying distinct categories within the target dataset. When considering the overall test performance of the proposed model, it registered an accuracy of 97.57%. This metric reflects the effectiveness of the model in making correct predictions across all classes, further emphasizing its effectiveness in classification tasks after applying the preprocessing steps.

Table 5 presents the results of validating the model performance using the preprocessed dataset. Compared to the results shown in Table 3 without preprocessing, only the recall score of the Akiec class was reduced by 0.01 recall, while all other classes exhibited improved performances.

To compute the strength of the proposed model and obtain the optimal optimizer, we conducted numerous experiments using different backbone models and optimizers. After a comprehensive experiment on various collections, we concluded that Adam performed better than the others on ResNet. Based on these results, we also used Adam for our proposed model and demonstrated its remarkable performance in skin cancer detection.

4.5. Results Comparison with State-of-the-Art Methods

A comparison between the proposed skin cancer classification model and other existing models is presented in Table 6. Compared to the current models, the proposed model obtained a promising performance in all of the evaluation metrics.

Table 7 describes comparisons results with state-of-the-art models. The proposed model achieved a precision of 96.42%, F1-score of 96.71%, and similar average recall and accuracy scores of 97.57%. The proposed model outperformed the benchmark method by obtaining higher precision, recall, F1-score, and accuracy values by 1%, 1.07%, 0.10%, and 1.43%, respectively.

4.6. Ablation Study

As shown in Table 8, various ablation studies were conducted to determine the optimal skin cancer detection model. The proposed model was evaluated using four different settings in an ablation study. In setting (1), Solo ConvNeXtBase was employed, and the worst results were obtained, with an average accuracy of 96.40%. Setting (2) used ConvNeXtBase with a channel attention mechanism; this strategy achieved a higher performance than setting (1). The performances of settings (3) and (4) were better than those of setting (2). In setting (3), the spatial attention mechanism was employed with ConvNeXtBase, which achieved an improved performance compared with settings (1) and (2) and a lower performance compared to setting (4). Setting (4) involved the proposed model coupled with channel and spatial attention mechanisms, and it achieved improved performances throughout the experiment compared to the other settings. Table 8 shows the model effectiveness analysis obtained by integrating different modules with ConvNeXtBase.

4.7. Visualization and Comparison of the Results of the Proposed Model

In this study, a Grad-CAM (gradient-weighted class activation mapping) visualization approach was utilized to assess the interpretability and localization performance of the proposed skin cancer classification model. Grad-CAM generates heat maps that highlight the regions in the input image that contribute most significantly to the model’s prediction, thus providing valuable insights into the decision-making process. The HAM10000 dataset, which consists of a large collection of annotated skin lesion images, was used to evaluate the model. A key advantage of using Grad-CAM is its ability to pinpoint the specific areas of the images that correspond to the skin cancer lesions, offering a transparent way to visualize the model’s focus during classification. Figure 7 displays the original images from the HAM10000 dataset, alongside their corresponding heat maps and the final model predictions. The heat maps were overlaid on the original images to show which parts of the images the model emphasized in making its decisions. These visualizations are critical in confirming the model’s ability to accurately localize lesions. Each class including melanoma, basal cell carcinoma (BCC), and squamous cell carcinoma (SCC) was represented with its corresponding heat map. The heat maps clearly demonstrated that the proposed model effectively focused on the relevant areas of the lesions, making it a reliable tool for skin cancer classification.

The visual results from the Grad-CAM heat maps highlight the robustness of the proposed model as it consistently identifies and classifies skin lesions with high accuracy. The model’s ability to both classify and localize lesions reinforces its practical potential for use in clinical applications, where interpretability and accurate lesion localization are crucial. Overall, the results confirm that the proposed model is both effective and reliable for skin cancer classification, showing strong performance in lesion detection and classification across different skin cancer types.

4.8. Pros and Cons of the Proposed Model

The proposed model modification, integrating a ConvNeXt-based CNN architecture with modified channel and spatial attention mechanisms, offers distinct advantages tailored to the specific challenges of skin cancer detection. The inclusion of attention mechanisms enhances the model’s ability to focus on relevant regions of dermatoscopic images, leading to improved diagnostic precision and reduced false positives. By emphasizing critical areas in the data, the model achieved a fine balance between accuracy and interpretability, enabling clinicians to better understand the reasoning behind predictions. The ConvNeXt backbone further contributed to the model’s efficiency, leveraging its advanced feature extraction capabilities to handle high-dimensional image data while maintaining a relatively low computational overhead compared to more resource-intensive transformer architectures. This makes the model particularly suitable for specialized medical imaging tasks where both precision and resource optimization are critical.

However, the model’s design also introduces tradeoffs that must be considered. While it excels in focusing on image-based tasks, its unimodal nature limits its ability to incorporate complementary data sources, such as patient history or genetic information, which multimodal models like CLIP and LLAVA can efficiently handle. Multimodal approaches offer greater flexibility and generalization across various tasks, enabling broader applications but often at the cost of higher computational demands and the need for extensive training datasets. Additionally, the added complexity of attention mechanisms, while beneficial for feature prioritization, can increase the model’s training and inference time, potentially posing challenges for real-time deployment in resource-limited settings. Thus, while the proposed model strikes a balance between accuracy, efficiency, and interpretability in a specific domain, its scope may need to be expanded or adjusted to compete with the versatility and broader applicability of multimodal frameworks.

5. Conclusions

This study introduced a novel approach for detecting skin cancer by combining a ConvNeXt-based CNN model with a modified attention mechanism. This method is crucial for early detection and distinguishing between visually identical skin conditions. The ConvNeXt architecture is effective for feature extraction, and a custom attention mechanism was integrated to distinguish between malignant and benign lesions in dermatoscopic images. The experimental findings on the HAM10000 dataset demonstrated a dramatic improvement in accuracy and a decrease in false positives and false negatives. This study also highlighted the potential of deep learning models for medical image processing, particularly when paired with attention mechanisms. This method may aid in early diagnosis, reduce the impact of skin cancers on patients, and increase the chances of positive results. This study established a framework for more accurate and efficient skin cancer diagnoses by demonstrating the potential of ConvNeXt-based CNN models with attention processes. The investigation of ConvNeXt-based CNN models featuring attention mechanisms has revealed encouraging results for skin cancer recognition. However, further improvements are required including expanding the dataset to include a wider range of skin conditions, incorporating transfer learning from other medical imaging domains, adapting the model for real-time applications, developing interpretable AI methods, conducting clinical trials, and integrating multimodal data fusion to improve the diagnostic accuracy. These advancements will benefit patients and healthcare professionals in the early diagnosis of skin cancer.

Author Contributions

Conceptualization, S.M.T., H.-S.P. and S.H.S.; Methodology, S.M.T. and H.-S.P.; Software, S.M.T., H.-S.P. and S.H.S.; Validation, S.M.T., H.-S.P. and S.H.S.; Formal analysis, S.M.T., H.-S.P. and S.H.S.; Investigation, S.M.T. and H.-S.P.; Resources, H.-S.P. and S.M.T.; Data curation, H.-S.P. and S.H.S.; Writing—original draft preparation, S.M.T.; Writing—review and editing, S.M.T., H.-S.P. and S.H.S.; Visualization, S.M.T. and H.-S.P.; Supervision, H.-S.P. and S.H.S.; Funding acquisition, H.-S.P. and S.H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alam, T.M.; Milhan, M.; Atif, M.; Wahab, A.; Mushtaq, M. Cervical cancer prediction through different screening methods using data mining. IJACSA Int. J. Adv. Comput. Sci. Appl. 2019. [Google Scholar] [CrossRef]
Shalhout, S.Z.; Kaufman, H.L.; Emerick, K.S.; Miller, D.M. Immunotherapy for Nonmelanoma skin cancer: Facts and Hopes. Clin. Cancer Res. 2022, 28, 2211–2220. [Google Scholar] [CrossRef] [PubMed]
Mignion, L.; Desmet, C.M.; Harkemanne, E.; Tromme, I.; Joudiou, N.; Wehbi, M.; Baurain, J.-F.; Gallez, B. Noninvasive detection of the endogenous free radical melanin in human skin melanomas using electron paramagnetic resonance (EPR). Free Radic. Biol. Med. 2022, 190, 226–233. [Google Scholar] [CrossRef]
Gururaj, H.L.; Manju, N.; Nagarjun, A.; Aradhya, V.N.M.; Flammini, F. DeepSkin: A deep learning approach for skin cancer classification. IEEE Access 2023, 1, 50205–50214. [Google Scholar] [CrossRef]
Tschandl, P.; Codella, N.; Akay, B.N.; Argenziano, G.; Braun, R.P.; Cabo, H.; Gutman, D.; Halpern, A.; Helba, B.; Hofmann-Wellenhof, R.; et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: An open, web-based, international, diagnostic study. Lancet Oncol. 2019, 20, 938–947. [Google Scholar] [CrossRef] [PubMed]
American Cancer Society. Key Statistics for Melanoma Skin Cancer; American Cancer Society Center: Atlanta, GA, USA, 2022. [Google Scholar]
Riaz, L.; Qadir, H.M.; Ali, G.; Ali, M.; Raza, M.A.; Jurcut, A.D.; Ali, J. A Comprehensive Joint Learning System to Detect Skin Cancer. IEEE Access 2023, 11, 79434–79444. [Google Scholar] [CrossRef]
Krishnan, M.M.R.; Venkatraghavan, V.; Acharya, U.R.; Pal, M.; Paul, R.R.; Min, L.C.; Ray, A.K.; Chatterjee, J.; Chakraborty, C. Automated oral cancer identification using histopathological images: A hybrid feature extraction paradigm. Micron 2012, 43, 352–364. [Google Scholar] [CrossRef] [PubMed]
Akilandasowmya, G.; Nirmaladevi, G.; Suganthi, S.; Aishwariya, A. Skin cancer diagnosis: Leveraging deep hidden features and ensemble classifiers for early detection and classification. Biomed. Signal Process. Control 2024, 88, 105306. [Google Scholar] [CrossRef]
Argenziano, G.; Soyer, H.P. Dermoscopy of pigmented skin lesions—A valuable tool for early. Lancet Oncol. 2001, 2, 443–449. [Google Scholar] [CrossRef] [PubMed]
Kittler, H.; Pehamberger, H.; Wolff, K.; Binder, M. Diagnostic accuracy of dermoscopy. Lancet Oncol. 2002, 3, 159–165. [Google Scholar] [CrossRef]
Javaid, A.; Sadiq, M.; Akram, F. Skin cancer classification using image processing and machine learning. In Proceedings of the 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), Islamabad, Pakistan, 12–16 January 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Yar, H.; Abbas, N.; Sadad, T.; Iqbal, S. Lung nodule detection and classification using 2D and 3D convolution neural networks (CNNs). In Artificial Intelligence and Internet of Things; CRC Press: Boca Raton, FL, USA, 2021; pp. 365–386. [Google Scholar]
George, M.; Zwiggelaar, R. Breast tissue classification using Local Binary Pattern variants: A comparative study. In Medical Image Understanding and Analysis, Proceedings of the 22nd Conference, MIUA 2018, Southampton, UK, 9–11 July 2018; Proceedings 22; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Milton, M.A.A. Automated skin lesion classification using ensemble of deep neural networks in isic 2018: Skin lesion analysis towards melanoma detection challenge. arXiv 2019, arXiv:1901.10802. [Google Scholar]
Wolner, Z.J.; Yélamos, O.; Liopyris, K.; Rogers, T.; Marchetti, M.A.; Marghoob, A.A. Enhancing skin cancer diagnosis with dermoscopy. Dermatol. Clin. 2017, 35, 417–437. [Google Scholar] [CrossRef] [PubMed]
Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Taufiq, M.A.; Hameed, N.; Anjum, A.; Hameed, F. m-Skin Doctor: A mobile enabled system for early melanoma skin cancer detection using support vector machine. In eHealth 360°, Proceedings of the International Summit on eHealth, Budapest, Hungary, 14–16 June 2016; Revised Selected Papers; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Vidhyalakshmi, A.; Kanchana, M. AMLGB-: Efficient Model for Skin Disease Detection and Classification using Adaptive Machine for Light Gradient Boosting. In Proceedings of the 2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 23–25 January 2023; IEEE: New York, NY, USA, 2023. [Google Scholar]
Jaisakthi, S.M.; Mirunalini, P.; Aravindan, C. Automated skin lesion segmentation of dermoscopic images using GrabCut and k-means algorithms. IET Comput. Vis. 2018, 12, 1088–1095. [Google Scholar] [CrossRef]
Masood, A.; Al-Jumaily, A.; Anam, K. Self-supervised learning model for skin cancer diagnosis. In Proceedings of the 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), Montpellier, France, 22–24 April 2015; IEEE: New York, NY, USA, 2015. [Google Scholar]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
Nasr-Esfahani, E.; Samavi, S.; Karimi, N.; Soroushmehr, S.M.R.; Jafari, M.H.; Ward, K. Melanoma detection by analysis of clinical images using convolutional neural network. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016; IEEE: New York, NY, USA, 2016. [Google Scholar]
Ghanshala, T.; Tripathi, V.; Pant, B. An efficient image-based skin cancer classification framework using neural network. In Research in Intelligent and Computing in Engineering; Springer: Singapore, 2021; pp. 851–858. [Google Scholar] [CrossRef]
Hameed, N.; Shabut, A.; Hameed, F.; Cirstea, S.; Harriet, S.; Hossain, A. Mobile based skin lesions classification using convolution neural network. Ann. Emerg. Technol. Comput. (AETiC) 2020, 4, 26–37. [Google Scholar] [CrossRef]
Subramanian, R.R.; Achuth, D.; Kumar, P.S.; Reddy, K.N.K.; Amara, S.; Chowdary, A.S. Skin cancer classification using Convolutional neural networks. In Proceedings of the 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 28–29 January 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Malo, D.C.; Rahman, M.M.; Mahbub, J.; Khan, M.M. Skin Cancer Detection using Convolutional Neural Network. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 26–29 January 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
Nampalle, K.B.; Raman, B. An efficient multi-functional deep learning model for effective medical image classification using skin lesion database. In Proceedings of the 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), Online, 2–4 August 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
Li, L.; Seo, W. Deep learning and transfer learning for skin cancer segmentation and classification. In Proceedings of the 2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE), Kragujevac, Serbia, 25–27 October 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Agrahari, P.; Agrawal, A.; Subhashini, N. Skin cancer detection using deep learning. In Futuristic Communication and Network Technologies: Select Proceedings of VICFCNT 2020; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 27, Proceedings of the Annual Conference on Neural Information Processing Systems 2014, NIPS, Montreal, QC, Canada, 8–13 December 2014; NIPS: Montreal, QC, Canada, 2014; Volume 27, p. 27. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Zhu, B.; Hofstee, P.; Lee, J.; Al-Ars, Z. An attention module for convolutional neural networks. In Artificial Neural Networks and Machine Learning, Proceedings of the ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, 14–17 September 2021; Proceedings, Part I 30; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Rensink, R.A. The dynamic representation of scenes. Vis. Cogn. 2000, 7, 17–42. [Google Scholar] [CrossRef]
Corbetta, M.; Shulman, G.L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 2002, 3, 201–215. [Google Scholar] [CrossRef]
Qian, S.; Ren, K.; Zhang, W.; Ning, H. Skin lesion classification using CNNs with grouping of multi-scale attention and class-specific loss weighting. Comput. Methods Programs Biomed. 2022, 226, 107166. [Google Scholar] [CrossRef]
Singh, H.; Devi, K.S.; Gaur, S.S.; Bhattacharjee, R. Automated Skin Cancer Detection using Deep Learning with Self-Attention Mechanism. In Proceedings of the 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 28–30 April 2023; IEEE: New York, NY, USA, 2023. [Google Scholar]
Castro-Fernández, M.; Hernández, A.; Fabelo, H.; Balea-Fernández, F.J.; Ortega, S.; Callicó, G.M. Towards Skin Cancer Self-Monitoring through an Optimized MobileNet with Coordinate Attention. In Proceedings of the 2022 25th Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain, 31 August–2 September 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
Aladhadh, S.; Alsanea, M.; Aloraini, M.; Khan, T.; Habib, S.; Islam, M. An effective skin cancer classification mechanism via medical vision transformer. Sensors 2022, 22, 4008. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Yar, H.; Hussain, T.; Agarwal, M.; Khan, Z.A.; Gupta, S.K.; Baik, S.W. Optimized Dual Fire Attention Network and Medium-Scale Fire Classification Benchmark. IEEE Trans. Image Process. 2022, 31, 6331–6343. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Li, C.; Dai, X.; Yuan, L.; Gao, J. Focal modulation networks. Adv. Neural Inf. Process. Syst. 2022, 35, 4203–4217. [Google Scholar]
Chaturvedi, S.S.; Gupta, K.; Prasad, P.S. Skin lesion analyser: An efficient seven-way multi-class skin cancer classification using mobilenet. In Proceedings of the International Conference on Advanced Machine Learning Technologies and Applications, Jaipur, India, 13–15 February 2020; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Huang, H.W.; Hsu, B.W.; Lee, C.; Tseng, V.S. Development of a light-weight deep learning model for cloud applications and remote diagnosis of skin cancers. J. Dermatol. 2021, 48, 310–316. [Google Scholar] [CrossRef]
Shahin, A.H.; Kamal, A.; Elattar, M.A. Deep ensemble learning for skin lesion classification from dermoscopic images. In Proceedings of the 2018 9th Cairo International Biomedical Engineering Conference (CIBEC), Cairo, Egypt, 20–22 December 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
Carcagnì, P.; Leo, M.; Cuna, A.; Mazzeo, P.L.; Spagnolo, P.; Celeste, G.; Distante, C. Classification of skin lesions by combining multilevel learnings in a DenseNet architecture. In Proceedings of the 20th International Conference Image Analysis and Processing (ICIAP 2019), Trento, Italy, 9–13 September 2019; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Chaturvedi, S.S.; Tembhurne, J.V.; Diwan, T. A multi-class skin Cancer classification using deep convolutional neural networks. Multimedia Tools Appl. 2020, 79, 28477–28498. [Google Scholar] [CrossRef]
Alsunaidi, S.J.; Almuhaideb, A.M.; Ibrahim, N.M.; Shaikh, F.S.; Alqudaihi, K.S.; Alhaidari, F.A.; Khan, I.U.; Aslam, N.; Alshahrani, M.S. Applications of big data analytics to control COVID-19 pandemic. Sensors 2021, 21, 2282. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Data preprocessing step to increase the number of samples in the skin cancer benchmark for adequate classification.

Figure 2. Generic overflow of the proposed DCAN-Net for effective skin cancer classification.

Figure 3. Generic overflow of the proposed attention mechanism for effective skin cancer classification.

Figure 4. Sample images of each class from the HAM10000 dataset.

Figure 5. Accuracy and loss graphs of the proposed model without the preprocessing step.

Figure 6. Accuracy and loss graphs of the proposed model with preprocessing.

Figure 7. Visualized results of the proposed model for effective skin cancer localization and classification.

Table 1. Data with and without preprocessing.

No.	Class	Before Preprocessing	After Preprocessing
1	Akiec	327	1099
2	Bcc	541	1099
3	Bkl	1099	1099
4	Df	155	1099
5	Nv	6705	6705
6	Mel	1113	1113
7	Vasc	142	1099

Table 2. Confusion matrix of the proposed model without the preprocessing steps.

Class	Akiec	Bcc	Bkl	Df	Mel	Nv	Vasc	Class-Wise Accuracy
Akiec	0.98	0.01	0.00	0.00	0.00	0.01	0.00	98.00%
Bcc	0.07	0.84	0.06	0.00	0.01	0.00	0.02	84.00%
Bkl	0.02	0.03	0.86	0.01	0.03	0.05	0.00	86.00%
Df	0.00	0.00	0.00	1.00	0.00	0.00	0.00	100%
Mel	0.03	0.01	0.07	0.00	0.80	0.09	0.00	80.00%
Nv	0.00	0.00	0.01	0.00	0.03	0.95	0.01	95.00%
Vasc	0.00	0.00	0.00	0.00	0.00	0.00	1.00	100%

Table 3. Proposed skin cancer classification results without preprocessing.

S: No.	Class Name	Precision	Recall	F1-Score
1	AKiec	0.92	0.98	0.95
2	Bcc	0.85	0.84	0.84
3	Bkl	0.80	0.86	0.83
4	Df	0.94	1.00	0.97
5	Mel	0.77	0.80	0.78
6	Nv	0.93	0.95	0.94
7	Vasc	0.99	1.00	0.99

Table 4. Confusion matrix of the proposed model with preprocessing.

Class	Akiec	Bcc	Bkl	Df	Mel	Nv	Vasc	Class-Wise Accuracy
Akiec	0.97	0.01	0.00	0.00	0.00	0.01	0.01	97.00%
Bcc	0.01	0.95	0.01	0.00	0.02	0.01	0.00	95.00%
Bkl	0.01	0.01	0.96	0.00	0.01	0.01	0.00	96.00%
Df	0.00	0.00	0.00	1.00	0.00	0.00	0.00	100%
Mel	0.00	0.00	0.02	0.00	0.97	0.01	0.00	97.00%
Nv	0.00	0.00	0.00	0.00	0.01	0.98	0.01	98.00%
Vasc	0.00	0.00	0.00	0.00	0.00	0.00	1.00	100%

Table 5. Classification results of the proposed skin cancer classification model with preprocessing.

S: No.	Class Name	Precision	Recall	F1-Score
1	AKiec	0.96	0.97	0.97
2	Bcc	0.97	0.95	0.96
3	Bkl	0.95	0.96	0.95
4	Df	0.98	1.00	0.99
5	Mel	0.96	0.97	0.96
6	Nv	0.97	0.98	0.97
7	Vasc	1.00	1.00	1.00

Table 6. Performance of different models when evaluating various optimizers.

S: No	Optimizer	ResNet	Inception	DenseNet	EfficientNet	Xception	MobileNetV2
1	SGD	65.7	64.3	89.3	91.4	90.0	89.4
2	Adamax	94.3	93.6	95.7	92.2	91.2	89.7
3	Nadam	93.7	82.2	96.3	92.5	91.5	90.0
4	Adagrad	78.5	77.5	90.6	91.0	89.7	88.8
5	adadelta	95.2	53.5	66.4	91.8	90.3	89.1
6	RMSprop	92.5	90.8	94.7	92.7	90.7	89.3
7	Adam	96.6	95.9	96.3	92.3	91.1	90.0

Table 7. Comparison of the proposed model comparisons with state-of-the-art models.

Reference	Dataset	Precision	Recall	F1-Score	Accuracy
Gupta [43]	HAM10000	89.00	83.00	83.00	83.10
Huang [44]	KCGMH, HAM10000	75.18	--	--	85.80
Shahin [45]	ISIC 2018	86.20	79.60	--	89.90
Carcagnì [46]	HAM10000	88.00	76.00	82.00	90.00
Chaturvedi [47]	HAM10000	88.00	88.00	--	93.20
Alsunaidi [48]	King Fahd Hospital Dataset	92.22	84.20	88.03	95.80
Aladhadh et al. [39]	HAM10000	96.00	96.50	97.00	96.14
DCAN-Net	HAM10000	97.00	97.57	97.10	97.57

Table 8. Ablation study of the proposed model by integrating various attention modules. All the results are given as percentages (%).

Methods	Precision	Recall	F1-Score	Accuracy
Solo-ConvNeXtBase	96.30	96.70	96.49	96.40
ConvNeXtBase + channel attention	96.55	96.80	96.67	96.52
ConvNeXtBase + spatial attention	96.88	97.15	97.01	97.00
ConvNeXtBase + channel + spatial attention	97.00	97.57	97.10	97.57

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thwin, S.M.; Park, H.-S.; Seo, S.H. A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism. Appl. Sci. 2025, 15, 1067. https://doi.org/10.3390/app15031067

AMA Style

Thwin SM, Park H-S, Seo SH. A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism. Applied Sciences. 2025; 15(3):1067. https://doi.org/10.3390/app15031067

Chicago/Turabian Style

Thwin, Su Myat, Hyun-Seok Park, and Soo Hyun Seo. 2025. "A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism" Applied Sciences 15, no. 3: 1067. https://doi.org/10.3390/app15031067

APA Style

Thwin, S. M., Park, H.-S., & Seo, S. H. (2025). A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism. Applied Sciences, 15(3), 1067. https://doi.org/10.3390/app15031067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism

Abstract

1. Introduction

2. Related Work

2.1. Analysis of Existing Attention Mechanisms for Skin Cancer Detection

2.2. Conventional Machine Learning Methods

2.3. Deep Learning Methods

2.4. Hybrid Learning Methods

3. Proposed DCAN-Net

3.1. Preprocessing

3.2. Proposed Model Selection

4. Results and Discussion

4.1. Experimental Setting

4.2. Dataset Description

4.3. Model Performance Metrics

4.4. Results Evaluation with and Without Preprocessing

4.5. Results Comparison with State-of-the-Art Methods

4.6. Ablation Study

4.7. Visualization and Comparison of the Results of the Proposed Model

4.8. Pros and Cons of the Proposed Model

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI