1. Introduction
Brain tumors present a significant challenge in medical diagnostics and treatment, with over 100 distinct types that require timely and precise diagnosis to improve patient outcomes. In the United States, brain tumors are the fifth most common cancer, with approximately 80,000 new cases annually, 28% of which are malignant. These tumors are particularly prevalent among children, being the most common cancer in those aged 0–14 and the third most common in adolescents and young adults aged 15–39. The survival rates for brain tumors vary significantly depending on the type and stage at diagnosis. For example, glioblastomas have a dismal 5-year survival rate of just 5%, whereas meningiomas, typically less aggressive, have a survival rate of about 85% [
1]. This disparity highlights the importance of early and accurate diagnosis, which can significantly influence treatment outcomes.
AI-driven approaches are increasingly being adopted for the early diagnosis of brain tumors, offering the potential to enhance accuracy and reduce diagnostic delays. These delays can have significant consequences, as observed in patients with malignant brain tumors, where delayed diagnosis leads to more complex treatments and poorer outcomes [
2]. Early diagnosis improves clinical outcomes and results in considerable cost savings. For instance, research on malignant melanoma indicates that the average per-patient cost in the first 12 months after diagnosis is substantially lower for early-stage diagnoses, with potential cost savings of 40.55% due to early detection [
3]. Similarly, a study on patients with malignant brain tumors found that those diagnosed early had significantly lower treatment costs [
4].
Integrating AI into clinical practice makes achieving improved patient outcomes and reduced healthcare costs possible.
Recent studies have highlighted the effectiveness of various deep-learning models in accurately diagnosing brain tumors. One notable study by Dipu et al. investigated the use of YOLO and FastAi frameworks for detecting and classifying brain tumors using the BRATS 2018 dataset. Their findings showed that while YOLOv5 achieved an accuracy of 85.95%, the FastAi model outperformed it with an impressive 95.78% accuracy. This study highlights the potential of real-time brain tumor detection models in early diagnosis, which is crucial for improving patient outcomes [
5].
Almufareh et al. also focused on object detection frameworks but compared YOLOv5 and YOLOv7 for brain cancer classification in MRI scans. Their approach incorporated advanced preprocessing techniques, such as mask alignment, to improve tumor region identification. YOLOv7 emerged as the superior model with a mean Average Precision (mAP) of 0.941 at an IoU threshold of 0.5, demonstrating its robustness in both box detection and mask segmentation [
6].
In a different approach, Alhussainan et al. adapted the YOLO architecture specifically to classify meningioma firmness. They implemented transfer learning and data augmentation to enhance the model’s robustness, achieving outstanding results with YOLOv7, including a 99.96% mean average precision and 100% sensitivity. This study emphasizes the importance of model adaptation and fine-tuning in achieving high performance in specialized tasks [
7].
Iriawan et al. combined CNN and FCN architectures for brain tumor detection and segmentation using a YOLO-CNN for localization and an FCN-UNet for segmentation. Their methodology achieved a high correct classification ratio (CCR) of 97%, highlighting the effectiveness of combining detection and segmentation techniques for precise tumor delineation [
8].
Elazab et al. proposed a hybrid approach combining YOLOv5 with ResNet50 to enhance tumor localization and feature extraction in histopathology images. By integrating an extreme gradient boosting classifier, their model achieved 97.2% accuracy in grading gliomas, surpassing traditional methods. This novel approach has the benefit of combining different models to tackle the complexities of tumor classification in histopathology [
9].
Sadad et al. utilized the Unet architecture with ResNet50 as a backbone for brain tumor segmentation, achieving an IoU score of 0.9504. They also applied evolutionary algorithms and reinforcement learning for multi-classification, with NASNet achieving a remarkable 99.6% accuracy. This study highlights the effectiveness of various deep learning models and techniques to enhance classification performance. [
10]
Ari and Hanbay introduced a three-stage method that preprocesses MR images, classifies them using an ELM-LRF model, and segments the tumor regions. Their method achieved a classification accuracy of 97.18%, demonstrating its utility in computer-aided brain tumor detection. This approach emphasizes the importance of preprocessing and tailored classification models in improving detection accuracy [
11].
Saba et al. explored using the GrabCut method for segmentation and a VGG-19-based transfer learning model for feature extraction. Their model achieved near-perfect Dice similarity coefficients across multiple datasets, showing the potential of combining handcrafted and deep-learning features for accurate tumor segmentation [
12].
Shoaib et al. evaluated several models for brain tumor detection, including a transfer learning model and a custom CNN. Their findings indicated that transfer learning and BRAIN-TUMOR-net models achieved high accuracies, making them suitable for real-world diagnostic applications. This study underscores the versatility and effectiveness of transfer learning in medical imaging tasks [
13].
Alsubai et al. developed a CNN-LSTM hybrid model that achieved 99.1% accuracy in classifying brain tumors. By integrating CNN for feature extraction with LSTM for sequence prediction, this model offers a robust solution for MRI-based tumor classification, highlighting the potential of hybrid architectures in medical diagnostics [
14].
Raza et al. introduced DeepTumorNet, a modified GoogLeNet architecture that achieved exceptional results in brain tumor classification with a 99.67% accuracy. Their approach involved significant architectural modifications and a leaky ReLU activation function, setting a new benchmark in the field [
15].
Shah et al. fine-tuned an EfficientNet-B0 model for binary brain tumor classification, achieving 98.87% accuracy. Their extensive preprocessing and data augmentation efforts contributed to this high performance, showcasing the efficiency of fine-tuning in improving model accuracy for specific tasks [
16].
Talukder et al. proposed a transfer learning approach that involved extensive preprocessing and architecture reconstruction, achieving 99.68% accuracy with ResNet50V2. This study demonstrates the importance of fine-tuning and additional layer incorporation in optimizing transfer learning models for brain tumor classification [
17].
Finally, Li et al. proposed a multi-CNN method for brain tumor detection that combines multimodal information fusion with 3D-CNNs. Their approach, tested on the MICCAI BraTS 2018 dataset, significantly improved detection accuracy and specificity, highlighting the advantages of 3D networks and multimodal fusion in medical imaging [
18,
19].
The fusion approach and fine-tuning techniques observed in these studies have been integral in informing the current methodology. A fine-tuned EfficientNetB0 model with a fusion dense layer has been applied for multiclass classification of meningioma, pituitary, glioma, and no-tumor cases, using the strengths identified in prior research to enhance classification accuracy and robustness. To further evaluate the impact of transfer learning and fine-tuning, comparisons were made between the fine-tuned EfficientNetB0, its non-fine-tuned counterpart with a fusion dense layer, and a fusion CNN model. These models were assessed using metrics such as accuracy, precision, recall, F1-score, and confusion matrix, highlighting the critical role of fine-tuning in achieving superior performance.
The novelty of this study lies in the use of a fine-tuned EfficientNetB0 model with a fusion dense layer for multiclass brain tumor classification, focusing on meningioma, pituitary, glioma, and normal cases. While many existing studies tend to concentrate on binary classification or transfer learning without further optimization, this work shows how fine-tuning can lead to significant improvements in performance. By comparing the fine-tuned model with a non-fine-tuned version and a fusion CNN model, this research highlights how fine-tuning enhances classification accuracy, precision, recall, and F1-score. What sets this study apart is the in-depth evaluation of these models using a range of key metrics, offering valuable insights into how optimized transfer learning can enhance the accuracy of brain tumor classification. The combination of fine-tuning and fusion architecture in this context fills an important gap in multiclass classification research and holds great promise for improving diagnostic workflows in medical practice.
2. Materials and Methods
2.1. Description of Deep Learning Model Layers
Convolutional layer: The convolutional layer serves as the cornerstone of feature extraction in CNNs, particularly when dealing with image data such as brain tumor scans. The network is applied to a series of learnable filters or kernels across the input images in this layer. Each filter slides over the image in a process called convolution, which involves computing dot products between the filter and localized regions of the input image. This operation generates a set of feature maps, each emphasizing different aspects of the image, such as edges, textures, and patterns. For brain tumor detection, these feature maps are instrumental in capturing the nuanced spatial characteristics of tumors, which may include irregular shapes, varying textures, and distinct boundaries. The convolutional layer’s ability to detect these low-level features forms the basis for higher-level abstractions in subsequent layers, enabling the model to differentiate between healthy and tumorous tissue. By stacking multiple convolutional layers, the network progressively captures more complex patterns, critical for accurate diagnosis.
Max pooling layer: The max pooling layer follows the convolutional layers and plays a crucial role in spatial down sampling, reducing the feature maps’ dimensionality while preserving the most significant information. This layer divides each feature map into non-overlapping or overlapping regions and selects the maximum value from each region. The operation is typically performed using a fixed window size and stride, determining how much the pooling window shifts across the feature map.
By retaining only the most prominent features, max pooling effectively reduces the spatial dimensions of the feature maps. This reduction has several advantages: it decreases the computational load by reducing the number of parameters in subsequent layers, mitigates the risk of overfitting by simplifying the feature maps, and introduces a degree of translation invariance, meaning the model can better recognize features regardless of their exact location in the image. In the context of brain tumor detection, max pooling helps focus on the most critical tumor characteristics, such as prominent edges or high-intensity regions, while discarding less relevant details.
Global average pooling layer: The global average pooling (GAP) layer is an alternative to traditional fully connected layers by providing a more compact and efficient representation of the feature maps. Unlike max pooling, which retains spatial information by selecting maximum values within local regions, global average pooling operates on an entire feature map by averaging all the values within each map. This process condenses the spatial dimensions into a single scalar value per feature map, resulting in a one-dimensional feature vector. In brain tumor classification models, the GAP layer reduces the risk of overfitting by minimizing the number of parameters while maintaining the essential features learned by the network. By averaging across the entire feature map, this layer ensures that the model captures the global context of the tumor, considering all regions of the feature map equally. The resulting feature vector is fed into the final classification layers, contributing to the overall decision-making process.
Dense layer: Dense layers, also known as fully connected layers, are integral to the final stages of a CNN, where they synthesize the features extracted by previous layers into a comprehensive, high-dimensional vector. Each neuron in a dense layer is fully connected to every neuron in the preceding layer, allowing for a complete combination and interaction of the features. This dense connectivity enables the network to learn complex representations and perform sophisticated decision-making tasks. In the proposed brain tumor detection model, two parallel dense pathways are employed, each configured with different numbers of neurons. This design allows the model to capture various levels of abstraction and detail. One pathway might focus on broader, more general features, while the other emphasizes finer, more specific information. By processing features through these parallel pathways, the model can generate a richer and more nuanced representation, ultimately enhancing its ability to classify different types of brain tumors with higher accuracy.
Dropout layer: The dropout layer is a regularization technique designed to prevent overfitting, a common issue in deep learning models where the model performs well on training data but fails to generalize unseen data. During training, dropout randomly deactivates a fraction of the neurons in the layer, setting their output to zero. This random deactivation forces the network to learn redundant data representations as it cannot rely on any single neuron or subset of neurons. In the context of brain tumor classification, dropout improves the model’s robustness by ensuring that it does not become overly dependent on specific features or patterns that might be present in the training data but absent in new, unseen data. By introducing variability in the learning process, dropout encourages the model to develop a more general understanding of the features that distinguish tumors from healthy tissue, thereby improving its performance on new data.
Concatenate layer: The concatenate layer is a merging operation that plays a crucial role in models with parallel processing pathways, such as the one proposed for brain tumor classification. This layer combines multiple feature vectors, typically the outputs of different layers or pathways, into a single, comprehensive vector. By merging these diverse feature representations, the model can utilize the strengths of each pathway and integrate information from different aspects of the input data. In the fusion CNN model, the concatenate layer merges the outputs from the two parallel dense pathways, which have captured different levels of abstraction and detail. The resulting vector contains richer features, combining broad, high-level patterns with finer, more detailed information. This fusion of features enhances the model’s ability to classify brain tumors accurately as it can draw on a more complete understanding of the input data.
ReLU activation layer: The ReLU (Rectified Linear Unit) activation layer is a nonlinear transformation applied to neurons’ output, and it is a critical component of deep learning models. ReLU works by setting all negative inputs to zero while leaving positive inputs unchanged, introducing nonlinearity into the model. This nonlinearity is essential for the network’s ability to learn complex patterns and representations. ReLU activation functions are applied after each convolutional and dense layer in the proposed brain tumor classification model. By introducing non-linearity, ReLU enables the network to capture and learn from a wide range of features, from simple edges to complex textures and patterns that distinguish different types of brain tumors. With non-linear activation functions like ReLU, the model could learn only linear relationships, severely restricting its capacity to handle the complex, high-dimensional data inherent in medical images.
Softmax output layer: The softmax output layer is the final classification model layer responsible for producing probability distributions over the different classes. In the context of brain tumor classification, this layer takes the final feature vector generated by the network. It applies it to the softmax function, which converts the vector’s raw values into probabilities. Each value in the output vector corresponds to a specific class, such as different types of brain tumors, and the values sum up to one, representing a probability distribution. The softmax function ensures the model’s output is interpretable and actionable, providing a clear probabilistic prediction for each class. The class with the highest probability is typically selected as the model’s final prediction, but the probabilities also offer insight into the model’s confidence in its predictions. This probabilistic output is crucial in medical applications, where understanding the certainty of a diagnosis is just as important as the diagnosis itself.
Table 1 provides the technical specifications of deep learning model layers for brain tumor detection and classification.
2.2. Data Preparation and Augmentation
The dataset used in this study is a publicly available brain tumor MRI dataset sourced from Kaggle, which is widely used in similar research for benchmarking purposes. This dataset contains 3064 images divided into four categories: normal, glioma, meningioma, and pituitary tumors, with class distributions (
Table 2) as follows:
While the dataset demonstrates moderate class balance, it is acknowledged that slight variations exist. To address this, we applied data augmentation techniques, including rotations, flips, and intensity adjustments, to ensure that the model encountered diverse representations of each class during training. Additionally, class weighting was employed to account for the discrepancies in sample sizes, reducing the risk of model bias toward larger classes.
Regarding practical representativeness, the dataset includes MRI scans from multiple patients, introducing variability in anatomical structures and imaging conditions. However, as it is a curated dataset, it may not capture the full spectrum of complexities, such as noise variations, imaging artifacts, or scanner differences. These limitations are noted in the discussion section of the manuscript, along with suggestions for future work to validate the proposed model on more diverse and expansive datasets.
The dataset utilized in this study underwent a data preparation process to ensure its quality and suitability for subsequent analysis. Given the inherently complex and detailed nature of MRI scans, preprocessing is essential to enhance the accuracy and performance of AI models. MRIs present unique challenges due to the variability in image quality, contrast, and the presence of artifacts, making it crucial to apply robust preprocessing techniques. Effective preprocessing not only improves model performance but also supports medical professionals by enhancing AI’s ability to detect subtle details, thereby reducing the risk of missing critical information during diagnosis.
Initially, a data-cleaning pipeline was implemented, beginning with the removal of duplicate samples. This was achieved using an image vector comparison method, systematically identifying and eliminating duplicate entries to guarantee that each data point remained unique throughout the dataset. Additionally, the dataset was corrected for mislabeled images using domain expertise to inspect and rectify inaccuracies. This step was crucial in ensuring the dataset’s accuracy and reliability by accurately categorizing images according to their content.
Furthermore, all images within the dataset were uniformly resized to dimensions of (224, 224). This resizing standardized the image dimensions and optimized memory usage, facilitating efficient processing and analysis. Following these cleaning procedures, the dataset’s statistical integrity was maintained, with an average reduction of approximately 3-9% in the number of samples per category.
In addition to cleaning the data, several augmentation techniques were applied to improve the diversity and robustness of the dataset. These methods are aimed at not only increasing the number of images but also introducing meaningful variations that would help the model perform better across different scenarios. For example, salt-and-pepper noise was added with a probability of 0.02–0.05 to simulate real-world noise and make the model more resistant to noise-sensitive situations. To improve the clarity and details of the images, histogram equalization was used, which also helped standardize image intensities across different samples. The images were rotated between −15° and 15° to give the model exposure to various orientations, and brightness adjustments were made by modifying pixel intensity values between 0.8 and 1.2, simulating changes in lighting. Additionally, images were flipped both horizontally and vertically to further diversify the dataset.
Additionally, horizontal and vertical flipping operations were applied to create mirror images, further diversifying the dataset and ensuring that models were trained to recognize objects from multiple orientations. These augmentation strategies were crucial in preparing a robust dataset capable of generalizing well to new data while maintaining label consistency across augmented samples.
For model training and evaluation, a standard train–validation–test split (80%-10%-10%) was implemented using a function that partitioned the dataset into training, validation, and test subsets. The data split was carried out prior to applying any augmentation techniques, which means the test data were not augmented. Initially, the dataset was divided into training, validation, and test sets using an 80-10-10 split to ensure balanced representation across the different sets. Once the split was completed, augmentation techniques such as adding noise, adjusting brightness, rotating, and flipping images were applied to the training and validation sets only. This approach ensures that the test set remains untouched, allowing for an unbiased evaluation of the model’s performance on unseen data. This split ensured that each class within the dataset was represented proportionally across all subsets, minimizing bias and facilitating robust model evaluation (
Table 3). The dataset was effectively refined and enriched through these meticulous data preparation and augmentation processes, laying a solid foundation for subsequent machine learning experiments and analyses in this study.
2.3. Proposed Model
The proposed model for brain tumor classification (
Figure 1) utilizes the EfficientNetB0 architecture, which is a highly efficient and effective CNN. EfficientNetB0 stands out due to its compound scaling approach, which scales the depth, width, and resolution of the network uniformly. This scaling allows for improved performance with fewer parameters compared to traditional architectures like ResNet or VGG, making it an ideal choice for resource-constrained environments, such as medical imaging tasks, where both accuracy and efficiency are paramount. EfficientNetB0 achieves a better trade-off between performance and resource utilization, which is critical in medical fields, where high-quality images need to be processed within limited computational budgets.
The model’s input layer is designed to accept images of size 256 × 256 pixels with three RGB channels, shaping the input as (None, 256, 256, 3). The image dimensions are chosen to balance computational efficiency and the retention of sufficient detail for brain tumor detection. Image preprocessing includes resizing the input to fit EfficientNetB0’s original design, typically expecting a 224 × 224 pixel input for its pre-trained weights. This resizing ensures compatibility with the model and prepares the images for further processing. The model uses pre-trained EfficientNetB0 weights from ImageNet, which helps transfer valuable features learned from large-scale datasets to the brain tumor classification task. The model removes the top classification layers of EfficientNetB0 to use it as a feature extractor, allowing the network to focus on learning the unique patterns associated with brain tumors while retaining the generalization capability from pre-trained features.
To adapt the features learned by EfficientNetB0 to the brain tumor classification task, the model employs fine-tuning. This involves leaving the first 232 layers of EfficientNetB0 frozen and training the last five layers. Fine-tuning these final layers helps the model specialize in detecting tumor-specific features while still benefiting from the generic image representations learned from ImageNet. This selective fine-tuning strategy is especially important for medical imaging tasks, where the amount of labeled data is often limited, and using pre-trained models for transfer learning is crucial for achieving high performance with minimal labeled data.
Following the feature extraction process, the model includes a global average pooling (GAP) layer, which reduces the spatial dimensions of the feature maps generated by EfficientNetB0. The output of the EfficientNetB0 model, which has a shape of (None, 8, 8, 1280), is passed through the GAP layer, converting it into a one-dimensional vector of size (None, 1280). GAP helps reduce the model’s complexity by summarizing spatial information into a fixed-size vector, which not only makes the model more computationally efficient but also mitigates the risk of overfitting by reducing the number of trainable parameters.
The next part of the model architecture involves two parallel dense pathways, which allow the model to learn different levels of abstraction in the features. The first pathway contains three dense layers with 128, 64, and 32 neurons, respectively. These layers are designed to capture high-level abstract features that help the model understand complex relationships within the data. The second dense pathway consists of three layers with 64, 32, and 16 neurons, respectively, designed to capture more detailed, granular features of the brain tumors. This dual-pathway approach allows the model to combine both broad and specific information, enabling it to make better-informed decisions when distinguishing between tumor types.
After the dense layers have processed the features, their outputs are merged through a concatenate layer, combining the high-level and low-level features from both pathways into a unified vector. This combined vector captures a richer set of features, allowing the model to make more accurate predictions by utilizing the strengths of both pathways. The final output layer is a dense layer with a softmax activation function, which generates the classification probabilities for each brain tumor class. This layer outputs the probability distribution over the possible tumor types, ensuring the model can classify the image into one of the four predefined classes: normal, glioma, meningioma, and pituitary tumor.
For model training, the Adam optimizer is used as it is well suited for tasks involving sparse gradients and large-scale data. The learning rate is set to 3 × 10−4, which strikes a balance between convergence speed and stability during training. A weight decay of 1 × 10−5 is also employed to prevent overfitting by penalizing large weights. The model is trained for 20 epochs with a batch size of 32, a typical choice that balances memory usage and convergence speed. Early stopping is used to monitor validation accuracy during training, ensuring the model does not overfit by halting training when performance on the validation set no longer improves.
The performance of the model is evaluated using several metrics, including accuracy, precision, recall, and F1-score. These metrics provide a comprehensive evaluation of the model’s performance, ensuring it can correctly classify brain tumor types while minimizing false positives and false negatives. The confusion matrix is also used to assess how well the model distinguishes between the different tumor types, showing the number of true positives, false positives, true negatives, and false negatives for each class. The use of precision and recall allows for a deeper analysis of the model’s ability to make accurate predictions for each individual class, especially when dealing with imbalanced datasets.
Finally, the model is regularly assessed using train and validation loss and accuracy curves, which track the model’s progress over time. These curves help ensure that the model is learning effectively and not overfitting, providing insights into the behavior of the model during training. The final trained model is robust, efficient, and capable of making reliable classifications of brain tumors based on MRI images, which is crucial for assisting medical professionals in diagnosing and treating patients.
Overall, the architecture (
Table 4) combines the efficiency of EfficientNetB0’s feature extraction, the specialization achieved through fine-tuning, and the flexibility offered by parallel dense pathways. This design allows the model to make highly accurate classifications, even when working with limited training data, which is typical in medical imaging tasks. By using transfer learning and advanced deep learning techniques, the model strikes a balance between high accuracy and computational efficiency, making it suitable for deployment in practical medical applications where both factors are critical. A direct performance comparison table (
Table 5), contrasting the proposed model with methods such as ResNet50V2, DeepTumorNet, and CNN-LSTM, is also presented.
2.4. Experimental Setup
This study’s experimental setup (
Figure 2) involves implementing and evaluating three different models for brain tumor classification using MRI images. This study was conducted in Google Colab, utilizing a T4 GPU to accelerate model training and evaluation. The dataset of brain tumor MRI images was initially divided into training, validation, and test sets with an 80-10-10 split to ensure a balanced assessment across all stages of the model’s development.
Before feeding the images into the models, preprocessing steps were conducted to remove the black outer regions commonly present in MRI scans, focusing the model’s attention solely on the brain regions. This preprocessing was carried out automatically to enhance the relevancy of the input data and improve the model’s learning efficiency.
The first model, the proposed model, employs a transfer learning approach using the EfficientNetB0 architecture as the base model. EfficientNetB0 was initialized with pre-trained weights from ImageNet, and the top layers were removed. The base model was then fine-tuned, with the last five layers set to be trainable to adapt the features specifically to the brain tumor dataset. At the same time, the earlier layers remained frozen to preserve the learned low-level features. The input images were resized to 256 × 256 pixels to match the required input size of the EfficientNetB0 model. The architecture following the base model included a series of dense layers with ReLU activations that branched into two separate dense layers concatenated before the final classification layer. The output layer used a softmax activation function to classify the images into one of the predefined categories.
During training, the model was compiled with a categorical cross-entropy loss function, optimized using the Adam optimizer with a learning rate of 3 × 10−4 and a weight decay of 1 × 10−5 to prevent overfitting. The model was trained for 20 epochs, with a checkpoint callback set up to save the best-performing model based on validation accuracy. The training process also included the generation of a model plot to visualize the architecture and ensure the model was structured as intended. The evaluation of the proposed model involved testing it on the unseen test data, where its accuracy was calculated and further assessed using a confusion matrix to provide a detailed breakdown of the model’s performance across different classes.
This process was the same for the other two models for comparison purposes. The second model, the fusion CNN, was a custom convolutional neural network consisting of two parallel pipelines: the MainModel and LowerModel. Each pipeline involved several convolutional layers with batch normalization and ReLU activations, followed by global average pooling. The outputs of these two pipelines were concatenated and fed into fully connected layers with dropout regularization before the final softmax classification layer. This model was designed to explore more traditional CNN architecture and its effectiveness compared to transfer learning methods.
The third model, the transfer learning feature extraction model, used the same EfficientNetB0 architecture as the proposed model, but the base model’s weights were entirely frozen, meaning that the feature extraction layers were not fine-tuned on the brain tumor dataset. Instead, the pre-trained weights were used as-is to extract features and passed through a dense layer architecture similar to the proposed model. This model was designed to compare the benefits of fine-tuning versus feature extraction without fine-tuning in a transfer learning context.
Each model was trained and evaluated using the same training, validation, and test splits to ensure a fair comparison. The evaluation metrics included overall accuracy, confusion matrices, precision, recall, F1-score, and train and validation loss and accuracy curves, which provided insights into the models’ ability to correctly classify the brain tumor images. The comparative analysis aimed to highlight the strengths and limitations of each approach, with a particular focus on how fine-tuning and feature extraction in transfer learning influence model performance relative to a custom CNN.
4. Discussion
In the initial stages of this study, YOLOv8 was considered for implementation due to its potential for real-time detection of tumors within MRI images. This approach would have enabled not only the classification of tumors but also their precise localization, thereby enhancing the clinical relevance of the model. Previous studies utilizing YOLOv7 and YOLOv5 had demonstrated strong performance in similar tasks, making YOLOv8 a promising candidate. However, the transition to a CNN-based approach required using a dataset annotated for both classification and detection tasks.
It was necessary to annotate a new dataset suitable for training CNNs and YOLOv8. The annotation process, particularly for MRI images, required high expertise to identify and mark the tumor regions accurately. The lack of a radiologist’s involvement in the annotation process presented significant challenges, ultimately preventing the creation of a dataset that could effectively train YOLOv8. As a result, the focus shifted towards classification tasks using CNN architectures, where the available data could be better utilized.
For future research, collaboration with radiologists is essential to ensure the accuracy of annotations, which is crucial for the practical training of models like YOLOv8. Additionally, to evaluate whether the differences in the performance of the models were statistically significant, the use of McNemar’s test was considered. This non-parametric test is well suited for comparing two classification models on the same dataset as it examines the significance of differences in their predictions. McNemar’s test is particularly valuable in scenarios where overall accuracies might be similar but the specific predictions differ, providing a deeper insight into model performance.
Incorporating McNemar’s test in future studies would allow for a better comparison between the proposed model and the transfer learning feature extraction model, which had similar results, helping to determine if the observed differences in performance are statistically significant. While the current study focused on CNN-based classification due to the challenges of dataset annotation, future work involving YOLOv8, with an adequately annotated dataset in collaboration with radiologists, holds the potential to advance the capabilities of real-time tumor detection and classification in MRI images.
5. Conclusions
The primary motivation behind using EfficientNetB0 with fine-tuning is to enhance its ability to learn domain-specific features for brain tumor classification. By fine-tuning the final layers, the proposed model achieves a balance between computational efficiency and specialization, which distinguishes it from approaches relying only on feature extraction. Specifically, during the fine-tuning of EfficientNetB0, only the final fully connected layers were modified, while the pre-trained weights of earlier layers were retained. This approach ensures efficient use of pre-trained features while adapting to the domain-specific dataset. This strategy has significantly improved performance compared to the baseline models.
The proposed model demonstrated outstanding performance in brain tumor classification, achieving an overall accuracy of 0.99 on the test dataset. The model’s precision, recall, and F1-scores across the four tumor classes were similarly high, with the precision for class 0 reaching 1.00 and the F1-scores for classes 1, 2, and 3 all exceeding 0.98. These results indicate that the model is highly accurate and consistent in correctly identifying and classifying the different types of brain tumors present in the dataset.
Given these strong results, the proposed model holds significant potential for incorporation into clinical settings, where it could assist radiologists and clinicians in diagnosing brain tumors with a high degree of accuracy. Classifying tumors into specific categories can be crucial in determining the most appropriate treatment plans. For example, different types of tumors may require distinct surgical approaches, radiation therapies, or chemotherapy regimens. The model could aid in the diagnostic process by providing an automated and reliable classification, allowing for more effective treatment strategies.
In the future, efforts will be made to enhance model interpretability through the use of explainable AI techniques, such as heatmaps and saliency maps. These methods highlight the regions of MRI scans that influence the model’s predictions, making the decision-making process more transparent and easier for healthcare professionals to trust and use. To ensure regulatory compliance, the model is aligned with established medical device standards, including the FDA’s guidelines for AI in healthcare. Extensive validation on external datasets is conducted, and collaborations with clinical institutions are pursued to ensure that the model meets regulatory requirements and can be practically deployed. For deployment in resource-constrained environments, the EfficientNetB0 architecture was selected for its computational efficiency. Further optimization using techniques like quantization and pruning are carried out to make the model more suitable for low-power devices and more accessible across various healthcare settings.
Future work could also involve further refinement and validation of the model in a clinical environment, particularly through collaboration with medical professionals, to ensure that the model’s outputs align with clinical needs. Additionally, integrating the model with real-time detection capabilities, such as those provided by YOLOv8, could enhance its utility by allowing for simultaneous tumor localization and classification. This would give clinicians comprehensive diagnostic information in a single, streamlined process. Expanding the model’s capabilities to include additional tumor types or other medical imaging modalities could also be explored, further broadening its application in healthcare.