1. Introduction
Gamma Knife stereotactic radiosurgery (GKSRS) is a non-invasive technique used to treat brain tumors, vascular malformations, and other neurological conditions. Its history dates back to the 1940s when neurosurgeon Lars Leksell developed the concept of radiosurgery. In the 1950s, Leksell and Borje Larsson created the first Gamma Knife prototype at the Karolinska Institute in Sweden. The machine uses 192–201 cobalt-60 sources to deliver precise brain radiation [
1].
The first clinical use of Gamma Knife treatment was in 1968, and it began gaining popularity in the 1970s and 1980s [
2]. Initially, it treated hard-to-reach brain tumors and vascular malformations. Now, it addresses neurological conditions like trigeminal neuralgia, epilepsy, and Parkinson’s disease [
3].
Over time, the Gamma Knife has seen technological improvements, including computerized treatment planning and image guidance. Today, it is used in over 700 medical centers worldwide, treating over one million patients per year [
1,
2,
3,
4].
Machine learning (ML) is a subset of AI, enabling computer systems to learn from experience without explicit programming. ML uses algorithms trained on data to recognize patterns and make decisions as humans do [
5,
6].
The three main ML algorithms are:
Supervised learning: Maps input data to known output data for predictions, e.g., linear regression, decision trees, and neural networks.
Unsupervised learning: Identifies patterns and relationships in data, without supervision, e.g., clustering and principal component analysis.
Reinforcement learning: Learns through trial and error, optimizing for cumulative rewards, e.g., Q-learning and deep reinforcement learning.
ML finds applications in natural language processing, image recognition, predictive maintenance, fraud detection, and recommendation systems [
5].
Deep learning, a subfield of ML, employs multi-layer neural networks for complex tasks like image and speech recognition. It excels in computer vision, speech recognition, and natural language processing [
7,
8].
There are various types of deep learning algorithms for different problems [
7,
8]:
Convolutional neural networks (CNNs): Used for image and video recognition, and for analyzing local patterns.
Recurrent neural networks (RNNs): For sequential data, such as speech recognition and time series analysis.
Generative adversarial networks (GANs): Generate data similar to a given dataset, such as realistic images.
Autoencoders: Tasked with image/text compression, feature extraction, and denoising.
Many other algorithm variations exist to solve diverse problems [
7,
8]. Transfer learning (TL) is a technique in which a pre-trained model kickstarts a new task. The model first learns general features from a large dataset, then trains on a smaller, specific sample. TL is beneficial with limited labeled data, leveraging the pre-trained model’s knowledge [
9,
10]. It is used in computer vision, natural language processing, and speech recognition [
11,
12].
There are some new methods, like RCNN (region-based convolutional neural network) [
13] and attention models [
14], which are two different techniques commonly used in computer vision tasks. While they might not be the most typical choices for addressing the problem of early response prediction of brain metastases, they could potentially be applied in creative ways to enhance the performance of predictive models for this specific medical imaging task. RCNN is a family of methods used for object detection and localization in images. It involves dividing an image into multiple regions (proposals) and processing each region separately to detect and classify objects within them. In the context of brain metastases prediction, RCNN-like approaches could be adapted to identify and classify different regions of interest (ROIs) within medical images that indicate the presence of metastatic growth. These regions could correspond to areas with abnormal features indicative of metastases. By processing these ROIs separately, the model could potentially learn to detect early signs of metastases that might be overlooked by more traditional image analysis methods. Attention mechanisms have gained popularity in various deep learning applications, including computer vision and natural language processing. Attention mechanisms help models focus on the most relevant parts of the input data when making predictions. In the context of brain metastases prediction, attention mechanisms could be used to guide the model’s focus to specific regions within the medical images that are more likely to contain early signs of metastases. This could be particularly helpful in identifying subtle patterns or anomalies that might not be immediately apparent to human observers or traditional image analysis methods.
It is important to note that neither RCNN nor attention models are directly designed for the early response prediction of brain metastases. The typical approach for medical image analysis involves techniques such as convolutional neural networks (CNNs) and other deep learning architectures specifically tailored for image classification and segmentation tasks. However, the application of RCNN-like techniques and attention mechanisms in this context could be explored as part of a more advanced and innovative approach to improve the sensitivity and accuracy of early response prediction for brain metastases.
The success of these techniques would depend on factors such as the availability of labeled data, the complexity of the metastases detection task, and the computational resources available for model training and evaluation. It is recommended to work closely with domain experts in radiology and medical imaging when designing and evaluating such models to ensure their clinical relevance and efficacy.
GKSRS is a non-invasive radiation therapy for brain conditions. Machine learning and deep learning enhance SRS in multiple ways [
15,
16,
17,
18,
19]:
Treatment planning: ML helps identify brain structures in medical images (MRI) for precise target delineation.
Dose optimization: ML optimizes radiation doses during planning, balancing efficacy and tissue protection.
Prediction of outcomes: ML predicts SRS treatment outcomes based on patient characteristics.
Quality assurance: ML automates error detection in treatment delivery, enhancing safety and efficacy.
Treatment evaluation: ML assesses treatment effectiveness through patient data, refining protocols.
In summary, machine learning and deep learning improve Gamma Knife stereotactic radiosurgery, leading to a more efficient healthcare system [
18,
19].
In this work, we utilize Google Colaboratory (Google Colab), a cloud-based platform for Python code development via Jupyter notebooks. It offers a free environment for researchers, data scientists, and ML practitioners to analyze data, perform machine learning tasks, etc. [
20]. Google Colab boasts features such as access to free GPUs and TPUs for model training, integration with Google Drive for storage and notebook sharing, and real-time code cell execution with instant feedback. Popular Python libraries like TensorFlow, Keras, and PyTorch are supported [
18,
19].
Data augmentation is employed in ML and computer vision to expand training datasets with varied samples, addressing limited data and overfitting [
21,
22,
23,
24]. ML techniques include flipping, rotating, scaling, cropping, adding noise, and adjusting brightness/contrast. These transformations enhance model robustness and accuracy, with care taken to maintain data representativeness [
24].
Unbalanced datasets pose challenges in ML when one class vastly outweighs the others. Approaches to address this issue include resampling (oversampling/undersampling), class weighting, cost-sensitive learning, ensemble methods, and data augmentation [
25,
26,
27,
28,
29]. The appropriate method depends on the problem and dataset, necessitating evaluation of all classes for accurate results [
25,
26,
27,
28,
29].
Deep learning employs early stopping and callback lists to improve performance and prevent overfitting. Early stopping halts training when validation performance degrades after a set number of epochs, guarding against overfitting. Callback lists execute functions during training, enabling customization and the implementation of techniques like model checkpointing and learning rate scheduling [
25,
26,
27,
28,
29]. Keras supports both early stopping and callback lists, enhancing model training and performance.
In the following sections, we present an AI evaluation of prognostic factors in the evolution of stage-treated metastases based on medical imaging with the Gamma Knife treatment machine from our department, as depicted in the diagram from
Figure 1.
Our present goals are as follows:
Gather and preprocess data: Collect MRI scans from patients with stage-treated metastases who underwent Gamma Knife treatment. Preprocess the data to ensure its analysis readiness, addressing data imbalance using augmentation techniques like SMOTE.
Identify prognostic factors: Use domain expertise and existing research to identify potential factors influencing metastases evolution, including tumor size, location, shape, patient demographics, and clinical history.
Develop the AI model: Select a suitable deep learning algorithm and train it on preprocessed data to predict metastases evolution likelihood, considering the identified prognostic factors and Gamma Knife treatment specifics. Three methods are employed: CNN model from scratch, CNN model with transfer learning, and CNN model with fine-tuning.
Evaluate model performance: Test the AI model on separate data to assess its predictive capabilities using metrics like accuracy, sensitivity, specificity, confusion matrix, and receiver operating characteristics.
Interpret results: Analyze AI model outputs to identify the most important prognostic factors affecting metastases evolution in Gamma Knife-treated patients, using visualizations and statistical analysis to explore relationships between factors.
Validate findings: Verify AI model results with additional data and compare predictions against outcomes of real Gamma Knife-treated patients.
Communicate results: Present the AI evaluation findings clearly, highlighting crucial prognostic factors and their implications for metastases treatment with the Gamma Knife.
Research Gaps and Critical Limitations:
This paper’s study regarding the use of deep learning techniques to predict metastases evolution post-treatment represents a significant step forward; however, there are notable research gaps and limitations that should be acknowledged:
Sample Size and Generalization: One critical limitation is the relatively small sample size used in the study. This raises concerns about the model’s ability to generalize to a larger patient population and diverse clinical settings. Addressing this gap is crucial to ensure the model’s robustness and reliability.
Single-Center Design and Bias: The single-center design of the study introduces the potential for institutional biases and limitations in regards to the diversity of patient cases. Multicenter studies or more diverse datasets are needed to validate the model’s effectiveness across different healthcare settings.
Retrospective Nature and Confounding Factors: The retrospective nature of the data collection may introduce confounding factors that could impact the accuracy and generalizability of the deep learning model’s predictions. Future research should consider prospective designs to minimize such biases.
Motivation, Contribution, and Novelties:
The motivation of the paper lies in leveraging deep learning techniques for predicting treatment outcomes in patients with stage-treated metastases who underwent Gamma Knife radiosurgery. The paper’s contributions and novelties include:
Clinical Application of Deep Learning: The paper introduces the novel application of deep learning in predicting metastases evolution, bridging the gap between advanced machine learning techniques and clinical decision making.
Accurate Outcome Prediction: The study demonstrates that deep learning algorithms can accurately predict treatment outcomes post-Gamma Knife radiosurgery. This insight has significant clinical implications for enhancing treatment decision-making processes.
Future Research:
The identified limitations and the potential impact of the paper’s findings suggest several avenues for future research:
Validation in Larger Cohorts: Future studies should aim to validate deep learning models using larger and more diverse patient cohorts. This will enhance the model’s reliability and ability to generalize to a broader patient population.
Multi-Center Studies: Conducting multicenter studies across different healthcare institutions can help mitigate biases associated with a single-center design and improve the model’s robustness in various clinical settings.
Prospective Studies: Prospective studies can offer more controlled data collection and minimize retrospective biases, thereby strengthening the validity of the deep learning model’s predictions.
Model Interpretability: Exploring techniques for explaining the deep learning model’s predictions could enhance its clinical utility by providing insights into the factors driving specific outcomes.
Generalizability to Other Clinical Contexts: Given the success of the model in predicting metastases evolution, investigating the adaptability of deep learning algorithms to other clinical contexts could expand the scope of its application.
Ethical Considerations: Future research should delve into the ethical implications of using AI-driven predictions in medical decision making, ensuring that patient autonomy, consent, and privacy are upheld.
Integration into Clinical Workflow: As mentioned in the context of an upcoming paper, developing an application for deploying the model in a clinical setting would require research into user interface design, real-time processing, and seamless integration with existing healthcare systems.
In summary, while the paper contributes valuable insights into using deep learning for predicting treatment outcomes in metastases patients, addressing its limitations and pursuing further research avenues will enhance the reliability, generalizability, and practicality of the proposed approach.
3. Discussion and Conclusions
In this paper, we used deep learning techniques to analyze imaging data from patients with stage-treated metastases who underwent Gamma Knife radiosurgery. Our results show that deep learning algorithms accurately predict metastases evolution post-treatment [
32].
However, our study has limitations, including a relatively small sample size, a single-center design, and a retrospective nature, potentially introducing biases and confounding factors.
Despite these limitations, our work provides essential insights into using deep learning techniques to predict treatment outcomes in metastases patients, with clinical implications for treatment decision making and patient outcomes. Future studies should validate our models in larger patient cohorts and explore deep learning algorithms in other clinical contexts.
Radiomics, an image analysis technique used in oncology, enhances diagnosis, prognosis, and clinical decision making for precision medicine. In brain metastases, radiomics identifies smaller metastases, defines multiple larger ones, predicts local response post-radiosurgery, and distinguishes radiation injury from metastasis recurrence. Radiomics approaches achieve high diagnostic accuracies of 80–90% [
32].
Notable papers related to radiomics and machine learning applications in stereotactic radiosurgery include a comprehensive review discussing brain tumor diagnostics, image segmentation, and distinguishing radiation injury from metastasis relapse [
32]. Studies regarding predicting the response after radiosurgery reveal potential, with features like the presence of a necrotic core, the fraction of contrast-enhancing tumor tissue, and the extent of perifocal edema [
33,
34,
35,
36,
37,
38].
The advances in radiomics and deep learning hold promise for precision medicine in brain metastases treatment, enabling precise diagnoses, prognoses, and treatment response monitoring [
39].
Randomized trials demonstrate the benefits of SRS as a standalone treatment for brain metastases, without significant decrease in survival. However, SRS alone associates with higher local failure rates, warranting identification of high-risk patients. Radiomics analyses show potential for predicting local failure and SRS response [
40,
41,
42,
43,
44,
45,
46,
47,
48].
Quantitative imaging features correlate with outcomes after radiation therapy, enhancing personalized cancer care. A multidisciplinary approach integrating radiomics and deep learning is essential in the medical decision-making and radiation therapy workflow for bone metastasis [
48].
Studies by Huang et al. and others explore significant radiomic features related to core volume and sphericity, predicting local tumor control after GKRS [
49]. Machine learning processes predict the brain metastasis response to GKRS, with promising accuracy [
50].
Cha et al. developed a radiomics model based on a convolutional neural network to predict the response to SRT for brain metastases, achieving promising results with ensemble models [
37].
In conclusion, our deep learning approach accurately predicts metastases evolution. Radiomics and machine learning are promising tools for improving brain metastases treatment. Validation studies and improved integration in clinical workflows are needed to maximize their potential [
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51].
Computational software related to applied fractal analysis used in this study was initiated and then successfully developed in the articles of some of the authors mentioned in the bibliography [
51,
52,
53,
54,
55].
Classification reports (
Table 3,
Table 5 and
Table 7) for three different deep learning models (CNN_model, TL_model, and FT_model) present performance metrics for a binary classification problem with two classes: “progression” and “regression”.
CNN_model:
“Progression” class: Precision = 0.94, Recall = 1.00, F1-score = 0.97, Accuracy = 0.98.
“Regression” class: Precision = 1.00, Recall = 0.98, F1-score = 0.99, Accuracy = 0.98. Overall, the CNN_model performed very well, with high precision, recall, and F1-scores for both classes.
TL_model:
“Progression” class: Precision = 0.78, Recall = 0.98, F1-score = 0.86, Accuracy = 0.92.
“Regression” class: Precision = 0.99, Recall = 0.90, F1-score = 0.94, Accuracy = 0.92. The TL_model performed well, with high recall for the “progression” class, but its precision could be improved.
FT_model:
“Progression” class: Precision = 0.62, Recall = 1.00, F1-score = 0.76, Accuracy = 0.83.
“Regression” class: Precision = 1.00, Recall = 0.77, F1-score = 0.87, Accuracy = 0.83. The FT_model showed high precision for the “regression” class and high recall for the “progression” class. However, there may be some misclassification in these cases, and the F1-scores indicate a tradeoff between precision and recall.
In summary, all three models demonstrated good to very good performance, but there is room for improvement in certain aspects. Further analysis, such as examining the confusion matrix, may provide additional insights into the models’ strengths and weaknesses.