The results will be presented systematically, beginning with the grid search results for the validation and testing subsets. Following this, the top five models from all training iterations will be discussed, and detailed performance metrics for these models will be provided.
3.1. Grid Search
During this phase, a total of 81 training sessions were conducted, encompassing all possible combinations of the hyperparameters outlined in the previous section. The results of the validation subset, recorded during the training process, are summarized in
Figure 3. Furthermore, the results of the testing subset, evaluated after training, are shown in
Figure 4.
Supplementary Material Tables S1 and S2 contains detailed tabulated results for further reference.
From a global perspective, it is evident that the models with lower learning rates (e.g., 0.0001) and moderate batch sizes (e.g., 20) tended to achieve a more favorable balance between accuracy and generalization. This characteristic is particularly significant in the context of brain tumor detection, where a model’s ability to generalize effectively to new and unseen data is crucial for its clinical applicability.
For instance, a model trained with a learning rate of 0.0001 and a batch size of 20 demonstrated a validation loss of 0.35 and a validation accuracy of 90.5%. This specific combination of parameters not only facilitated improved convergence during the training process but also contributed to the model’s stability, enabling more gradual and precise weight updates. Such behavior suggests enhanced generalization to unseen data, which is essential for accurate tumor detection across varying imaging conditions and diverse patient populations.
In contrast, models utilizing higher learning rates (e.g., 0.001) exhibited rapid convergence; however, they often displayed a significant disparity between training accuracy and validation accuracy, indicative of overfitting. Overfitting occurs when a model becomes overly attuned to the training data, capturing noise and idiosyncrasies specific to that dataset rather than learning patterns that are broadly generalizable. Consequently, these models tend to perform poorly when applied to new datasets.
The stability observed in models with lower learning rates and moderate batch sizes can be attributed to the fact that these settings allowed the model to make finer adjustments during the training process. A moderate batch size, such as 20, provides an optimal balance between the frequency of weight updates and the stability of those updates. This is particularly critical in deep convolutional neural networks, where abrupt adjustments can lead to destabilization during training and prevent the model from achieving a global optimum.
The top-performing models achieved accuracies exceeding 90%, demonstrating exceptional performance in the task of image classification for brain tumor detection. These results are particularly significant in the clinical setting, where precision and accuracy are critical for ensuring diagnostic confidence. For instance, a model configured with a batch size of 20, a learning rate of 0.0001, three convolutional layers, and two dense layers achieved a precision of 97% and an accuracy of 98%. These metrics underscore the model’s ability to consistently make correct predictions, accurately identifying both true positives and true negatives when applied to the test dataset.
This model’s configuration enabled it to capture complex and detailed features within the MRI images, which is essential for accurately distinguishing between healthy and tumor tissues. A precision of 97% indicates that the model correctly identified 97% of the positive cases, minimizing the margin of error and reducing the likelihood of false positives. Furthermore, an accuracy of 98% signifies that the model effectively maintained a strong balance in correctly identifying both healthy and tumor states. This is crucial for avoiding misdiagnoses, which could otherwise lead to inappropriate treatments or delays in medical intervention. Such high performance metrics are vital in ensuring that the model can be reliably integrated into clinical practice, where accurate and timely diagnosis is paramount.
Moreover, the precision and sensitivity metrics provided a more nuanced understanding of the models’ performances. For instance, a model configured with a batch size of 10, a learning rate of 0.001, four convolutional layers, and two dense layers achieved a precision of 96% and a sensitivity of 98%. The 96% precision indicates the model’s capability of accurately classifying images as positive, thereby minimizing the occurrence of false positives. On the other hand, the sensitivity of 98% demonstrates the model’s effectiveness in correctly identifying images that actually contained tumors, thus reducing the likelihood of false negatives. This balance between precision and sensitivity is critical, as it ensures that the model is not only accurate in its positive predictions but also reliable in detecting all tumor cases.
This model, with its specific configuration, successfully achieved a balance between identifying positive cases and minimizing false positives. This balance is particularly significant in medical applications, where false positives can lead to unnecessary invasive procedures and increased patient anxiety, while false negatives may result in the failure to provide timely treatment for patients with brain tumors. The model’s ability to maintain high sensitivity ensures that the vast majority of tumor cases are detected, which is essential for effective disease management and treatment.
3.2. Searching for the Best Models
The models with the best accuracy scores stood out for their ability to consistently make correct predictions.
Table 3 shows the results of the models with the highest accuracy.
These models demonstrated consistent performance, with accuracies exceeding 95%, underscoring their robustness and reliability in classifying magnetic resonance images for brain tumor detection. These high accuracy rates are particularly significant, as they indicate the models’ ability to effectively differentiate between images of healthy brains and those with tumors, thereby minimizing both false positives and false negatives.
Additionally, these models generally employed moderate batch sizes and medium learning rates, which suggests an optimal balance between training efficiency and model stability. Moderate batch sizes, such as 20 or 30, enabled efficient training by processing a sufficient number of samples in each iteration to develop a robust representation of the data, while also minimizing the introduction of excessive noise. This approach helped stabilize model weight updates, allowing for gradual and controlled adjustments that prevent abrupt oscillations that could destabilize the training process.
Medium learning rates, such as 0.0001 or 0.001, also played a crucial role in the performance of these models. A medium learning rate ensures that the model makes meaningful progress during each training iteration without making overly large adjustments that could lead to overfitting. This balance is essential for maintaining both the accuracy and generalizability of the model. The use of these learning rates allowed the models to effectively adapt to the training data, thereby enhancing their ability to generalize to new test data.
The models with the best precision scores excelled in minimizing false positives.
Table 4 presents the five models with the highest precision.
The model configured with three convolutional layers and three dense layers achieved an impressive accuracy of 98.8%, underscoring its remarkable efficiency in correctly identifying brain tumor cases. This configuration enabled the model to extract intricate and complex features from the MRI images, enhancing its ability to distinguish between healthy and tumor tissues. The three convolutional layers were instrumental in capturing varying levels of image features, ranging from basic edges and textures in the initial layers to more complex and tumor-specific patterns in the deeper layers. The subsequent three dense layers consolidated this information, enabling the model to make precise final classification decisions.
In addition to their high accuracy, these models employed configurations that prioritized high specificity, a critical factor in clinical applications. High specificity indicates a low false-positive rate, ensuring that the model accurately identifies images that do not contain tumors. In a clinical context, this is particularly important, as false positives can lead to misdiagnosis, causing unnecessary anxiety for patients and potentially resulting in unwarranted invasive procedures.
The models’ ability to minimize false positives enhances confidence in their predictions, ensuring that only cases with a high probability of tumor presence are flagged for further diagnosis and treatment.
Collectively, these models demonstrate that well-optimized configurations can achieve exceptional levels of accuracy and specificity, which are crucial for clinical applications. These models are particularly valuable in the context of brain tumor detection, where each classification decision carries significant implications for patient health. The high specificity of these models ensures that patients are not subjected to unnecessary procedures due to false positives, while their high accuracy ensures that true-positive cases are identified and treated promptly.
The models with the highest sensitivity scores were exceptionally effective in identifying true positives.
Table 5 presents the top five models based on this metric, highlighting their effectiveness in accurately detecting tumor cases:
The model with three convolutional layers and one dense layer achieved an impressive sensitivity of 99%, demonstrating its exceptional ability to detect the majority of positive cases of brain tumors in MRI images. This specific configuration enabled the model to extract a vast array of detailed and complex features from the images, capturing both fine patterns and more abstract structures that were indicative of the presence of tumors. The convolutional layers were instrumental in identifying and amplifying different levels of image features, ranging from edges and textures to more complex structures, while the dense layer effectively integrated this information to make accurate and well-informed decisions in the final classification.
The high sensitivity of this model indicates its remarkable effectiveness in detecting true positives, meaning it accurately identifies images that actually contain tumors. This is particularly crucial in the clinical setting, where missing a tumor can have severe consequences for a patient’s health. The ability to detect nearly all positive cases ensures that patients with brain tumors receive the correct diagnosis and timely treatment, significantly improving the chances of early intervention and potentially saving lives.
Importantly, these models often employed settings that favored positive detection, even at the expense of a higher number of false positives. This strategy of prioritizing sensitivity over accuracy may be advantageous in the medical field, where erring on the side of caution is preferable. In other words, it is more acceptable to have false positives that can be ruled out through additional testing than to miss a positive case, which could result in a critical omission of treatment. The priority of these models is to ensure that no case of brain tumor goes undetected, which is vital for patient safety and well-being.
The emphasis on high sensitivity also aligns with an early-detection strategy, where identifying potential brain tumor cases as early as possible can significantly impact treatment outcomes. The ability of these models to detect tumors at early stages increases the likelihood of early intervention and expands treatment options, which is crucial for improving the long-term prognosis of patients.
The models that achieved the best combination of precision, sensitivity, and accuracy represented an optimal balance across all evaluated metrics.
Table 6 below presents the results of the five models with the highest average percentages across these three key metrics, highlighting their overall performance and reliability in clinical applications.
From a comprehensive perspective, the model configured with a batch size of 30, a learning rate of 0.001, three convolutional layers, and one dense layer emerged as the top performer, achieving an average score of 98.3%. This model demonstrated a precision of 97.5%, a sensitivity of 99.2%, and an accuracy of 98.2%. This specific configuration enabled the model to capture detailed and complex image features, enhancing its ability to distinguish between healthy and tumor tissues. High sensitivity ensures that nearly all tumor cases are detected, while high accuracy minimizes false positives—an essential consideration in clinical settings, to prevent misdiagnosis and avoid unnecessary procedures.
The second-best model, configured with a batch size of 30, a learning rate of 0.001, two convolutional layers, and three dense layers, achieved an average score of 98.1%. This model also exhibited high levels of precision (98.4%) and sensitivity (98.0%), reflecting its ability to accurately identify both positive and negative cases. The combination of two convolutional layers and three dense layers provided an optimal balance between feature extraction and precise classification, resulting in excellent generalization to unseen data.
The third model, configured with a batch size of 10, a learning rate of 0.001, two convolutional layers, and three dense layers, attained an average score of 98.1%, with a precision of 97.9% and a sensitivity of 98.4%. Despite the smaller batch size, this configuration maintained high generalizability and accuracy in brain tumor detection. The smaller batch size allowed for more frequent weight updates, which may have contributed to more stable and faster convergence during training.
The fourth model, with a batch size of 20, a learning rate of 0.0001, two convolutional layers, and one dense layer, obtained an average score of 97.9%. This model demonstrated a precision of 98.0% and a sensitivity of 98.0%, indicating a well-balanced ability to detect positive cases while minimizing false positives. The lower learning rate allowed for finer adjustments of the weights during training, which likely contributed to the model’s stability and accuracy.
The fifth model, configured with a batch size of 20, a learning rate of 0.001, four convolutional layers, and one dense layer, also achieved an average score of 97.9%. This model displayed a precision of 98.4% and a sensitivity of 97.6%, reflecting its ability to handle complex features through a deeper network architecture. The four convolutional layers facilitated extensive feature extraction, while the final dense layer consolidated this information for accurate classification decisions.
The confusion matrices depicted in
Figure 5 provide a detailed visualization of the performance of these five best models, allowing for an examination of true positives, true negatives, false positives, and false negatives. These matrices were instrumental in understanding the specific strengths and weaknesses of each model in terms of their classification capabilities.
The first model demonstrated 233 true negatives, 211 true positives, two false negatives, and two false positives, reflecting its excellent ability to minimize errors while maintaining high accuracy and sensitivity. The second model showed 224 true negatives, 218 true positives, three false negatives, and three false positives, also exhibiting a strong balance between accuracy and sensitivity.
The third model achieved 231 true negatives and 214 true positives, with only one false positive and two false negatives, standing out for its notably low false positive rate. The fourth model recorded 219 true negatives, 223 true positives, two false negatives, and four false positives, indicating its strong performance in tumor detection.
Finally, the fifth model demonstrated 219 true negatives and 224 true positives, with three false negatives and two false positives, reflecting its high accuracy and sensitivity.
After a detailed analysis of the training results and model evaluations, including a thorough examination of the confusion matrices, it can be concluded that the model configured with a batch size of 30, a learning rate of 0.001, three convolutional layers, and one dense layer was the most optimal final model. This model excelled in minimizing errors—both false positives and false negatives—while maintaining an optimal balance across all key metrics.
The model’s high accuracy of 97.5% and sensitivity of 99.2% highlight its ability to accurately identify both positive and negative cases, which is crucial for clinical applications where diagnostic precision is essential. Furthermore, the model’s low false-positive rate minimizes the risk of patients undergoing unnecessary procedures due to incorrect diagnoses. The combination of performance metrics, including precision, sensitivity, and binary accuracy, underscores the model’s robustness and reliability for brain tumor detection.
To further validate its performance, the model was tested on an additional external dataset, specifically the Brain MRI Scan Images from the RSNA-MICCAI Brain Tumor competition (
https://www.rsna.org/rsnai/ai-image-challenge/brain-tumor-ai-challenge-2021), consisting of 227 images. The model achieved accuracy greater than 90%; while slightly lower than the results obtained with the main test subset, this still reinforced the model’s robustness and generalizability in practical, real-world scenarios. The slight performance decrease suggests some dependence on the primary training dataset, but the results remain high to consider a strong generalizability.