1. Introduction
Medical researchers and practitioners play a critical role in disease prediction, diagnosis, and treatment. However, a significant challenge faced by these professionals is the limited availability of adequate and diverse datasets [
1,
2]. The success of predictive models in medical disease analysis heavily relies on the quantity of data used for training [
3,
4]. To address these limitations, researchers have turned to augmentation techniques and the creation of synthetic datasets as practical solutions to expand and enhance the available data.
Some experts and researchers, for example [
5], and many more, often mistake image augmentation to be the same thing as synthetic datasets. However, there is a clear difference between augmentation and synthetic datasets. Image augmentation involves manipulating the training set by changing its geometric and color space properties (such as rotation, scaling, cropping, brightness, zooming, and contrast) [
6]. One major disadvantage of using primary data or augmented data is that there is always a risk of an accidental breach and lack of preservation of the privacy of patients represented by the data, as data collection through Magnetic Resonance Imaging (MRI) or Computed Tomography (CT) scans could contain images of the head, facial images, or comparable representations in a manner that allow the identities of research participants to be readily and accurately ascertained [
7,
8]. On the other hand, synthetic data can be used to circumvent this hurdle while also enhancing the performance of models where data size is not adequate to train models. Hence, synthetic datasets refer to artificially generated data that closely resemble the characteristics of the primary datasets. The original dataset is not utilized in the creation of synthetic data; rather, it is generated artificially to produce new sets of datasets that serve as supplementary or alternative sources of data, particularly in scenarios where there is a dearth of sufficient authentic data. The generation of synthetic datasets typically entails the utilization of generative models, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), or other methodologies, such as rule-based generators or simulation models [
9].
The utilization of GANs for generating synthetic data in medical research is a relatively novel approach that has not been extensively explored in the existing literature. For example, in the context of brain tumor prediction, GANs have not been widely studied for generating synthetic datasets. While there have been some applications of GANs in medical image synthesis for brain tumor segmentation in studies by Myronenko et al., Huang et al., and Cirillo et al. [
10,
11,
12], the use of GANs, specifically for generating synthetic data to improve brain cancer prediction models, remains relatively unexplored. In cardiac disease diagnosis, the availability of large and diverse datasets is crucial for training accurate predictive models [
13]. While augmentation techniques have been explored in this domain, typically in a study by Anwar et al. [
14], the use of GANs to generate synthetic data for improving cardiac disease diagnosis models remains underexplored. The use of GANs for generating synthetic medical images to augment existing datasets is an emerging field of research (e.g., for retinal image synthesis in diabetic retinopathy diagnosis) [
15]. However, a comprehensive investigation comparing the impact of GAN-generated synthetic images versus traditional augmentation techniques on the performance of medical image analysis models is relatively limited. While GANs have shown promise in generating synthetic data for various applications, their application, specifically for disease prediction and diagnosis, remains a relatively unexplored territory in the existing literature. This study’s focus on using GANs to create synthetic datasets and directly comparing their performance with classical augmentation techniques in the context of disease prediction, particularly for brain tumors, contributes to filling this gap and adding valuable insights to the field of medical research.
This study seeks to make several significant contributions to the domain of disease prediction, specifically for brain tumors. Because of the stressful lifestyle of humans all over the world, brain tumor occurrences are on a steady rise at an alarming rate. If a brain tumor is not detected at an early stage, it can have deleterious consequences on mortality. The first contribution of this study is to develop a novel Conditional Deep Convolutional Neural Network (C-DCNN) model from CNN architectures that are capable of detecting tumors and differentiating brain tumors from kidney tumors and then conduct a thorough comparison of the performance of our novel C-DCNN model with state-of-the-art models such as ResNet50, VGG19, InceptionV3, and VGG16 when trained on augmented datasets versus synthetic datasets. By analyzing the effectiveness of these two approaches, the study offers insights into which method leads to better predictive performance. Secondly, the research focuses on assessing the impact of using synthetic datasets on the prediction accuracy of disease models, specifically for brain tumors and kidney tumors. Understanding how synthetic datasets affect the performance of predictive models is essential for developing more accurate and reliable diagnostic tools for medical professionals. Moreover, this study undertakes a comprehensive evaluation of the potential advantages and limitations of utilizing synthetic datasets compared to augmented datasets in disease prediction. Identifying the strengths and weaknesses of each approach can guide researchers and practitioners in making informed decisions regarding dataset selection for their specific medical studies.
2. Related Studies
The concept of GANs was initially presented by Goodfellow and colleagues in their study [
16]. The researchers employed the backpropagation approach to train their multi-layer perceptron models for generators and discriminators. Over the years, GANs have gained a lot of popularity as a result of numerous modifications that have improved the quality of the images that are created and expanded the range of applications that may be used with them. A specific architectural design known as the Deep Convolutional Generative Adversarial Network (DCGAN), according to Radford et al. [
17], aims to lessen the distinction between Convolutional Neural Networks (CNNs) utilized for supervised and unsupervised learning. Using the least squares loss function for the discriminator, as stated by Mao et al. [
18], the Least Squares Generative Adversarial Network (LSGAN) is able to increase both the stability of training and the quality of the images that are produced. Study [
19] proposed the use of a Conditional Generative Adversarial Network, also known as a CGAN. The discriminator and generator in this network were taught to generate images by integrating class labels. This was accomplished through training [
20]. Through the process of combining the Deep Convolutional Generative Adversarial Network (DCGAN) and the Conditional Generative Adversarial Network (CGAN), the Conditional Deep Convolutional Generative Adversarial Network (C-DCGAN) was created. Feature extraction from the C-DCGAN model was accomplished by Luo et al. [
21] through the utilization of a convolutional neural network. There is also a conditional extension included to enhance the data. Another model called the InfoGAN model was introduced by Chen et al. [
22] and is built on an information-theoretic framework, allowing for the manipulation of the latent space. This model makes it possible to manipulate the latent space. In addition, the sleep–wake algorithm is incorporated into the process of training InfoGAN [
23]. Similarly, the Wasserstein Generative Adversarial Network (WGAN), proposed by [
24], has been developed as an improved training technique to solve the issues associated with mode collapse and to provide improved learning curves. As stated by Dharanya et al. [
25], label conditioning is an essential element that is included in the Auxiliary Classifier Generative Adversarial Network (ACGAN). Nevertheless, study [
26] developed the Energy-Based Generative Adversarial Network (EBGAN), which is capable of producing low-energy samples by incorporating energy functions into both the generator and discriminator components of the network. A Boundary-Seeking Generative Adversarial Network (BGAN) was developed in Hjelm et al. [
27]. This network is used to train the discriminator by assessing the difference between the images that are generated and the images that are being targeted; study [
28], on the other hand, presented the Boundary Equilibrium Generative Adversarial Network (BEGAN), which is a framework that combines an equilibrium-based technique to improve the Wasserstein GAN (WGAN) framework.
Several notable applications of GANs have been discovered in existing studies. These applications include object detection [
29], handwriting recognition [
30], facial age progression [
31] as a concept related to face recognition, super-resolution imaging [
32], visual saliency prediction [
33], and unsupervised domain adaptation [
34]. GANs have been used by researchers for data augmentation due to their impressive performance in image synthesis. In the beginning, these techniques were utilized to enhance the overall quality of the photographs, and later on, they were utilized for additional training and the generation of synthetic data. The addition of emotions to neutral faces to increase the number of underrepresented categories has also been performed. Previous studies have explored augmentation techniques and synthetic datasets have used GANs in various domains; for example, Abdulraheem et al. [
35], conducted a study in which they leveraged GAN models to develop additional datasets to offer correct results for the automatic recognition of expiration dates in photos. This type of recognition demands a significant number of data for learning purposes. To augment the data, the study installed and evaluated state-of-the-art GAN models, such as WGAN with Gradient Penalty (WGAN-GP), Wasserstein Divergence GAN (WGAN-DIV), Multi-Modal GAN (MMGAN), Non-Saturating GAN (NSGAN), Least Squares GAN (LSGAN), DRAGAN (Deep Regret Analytic GAN), Auxiliary Classifier GAN (ACGAN), Deep Convolutional GAN (DCGAN), Energy-Based GAN (EBGAN), and Variational Autoencoder (VAE). Although not a GAN, VAEs are used for data generation by learning a probabilistic latent space and Boundary Equilibrium GAN (BEGAN). These models were used to augment the existing data. Their method proved that GAN-generated datasets are particularly useful in boosting the overall performance of object identification applications, which is a very promising development. Similarly, in the health domain, the authors of Srivastav et al. [
36] conducted a study in which they used deep learning algorithms for the classification of images from chest X-rays to diagnose diseases such as pneumonia. To improve the performance of the model, GANs were trained to augment synthetic images and were then used to oversample the dataset. Image classification was performed using the VGG16 as the base model. The model was able to attain an accuracy of 94.5% when tested on the validation set using the augmented datasets. Another study by the authors of Qin et al. [
37] used GAN-generated synthetic datasets alone for the detection of lung diseases. Some of the applied network architectures included state-of-the-art ResNet, DenseNet, EfficientNet, and CNN. The obtained results showed that GAN-generated synthetic datasets achieved improved classification performance. Additionally, for the classification and timely detection of liver lesions, the authors of Frid-Adar et al. [
38] also used GAN-based synthetic medical augmentation techniques to increase the performance of a CNN model for up to 85.7% sensitivity and 92.4% specificity.
The benefits of GAN-based image synthesis and augmentation for brain cancer were highlighted in a review study of data augmentation techniques for brain tumor segmentation by Nalepa et al. [
39]. So, the researchers of Safdar et al., and Srinivas et al. [
40,
41] used these methods to improve brain tumor classification with state-of-the-art models, such as VGG-16 and ResNet-50. Although all of these successes have been achieved in existing studies, there remains a gap in the literature when it comes to the comprehensive comparison of these two approaches (synthetic and augmented datasets) for medical disease prediction, particularly focusing on brain cancer. Some studies have reviewed the use of augmented and synthetic data independently, but they fail to address the critical question of which method is more suitable and effective for medical disease prediction. For instance, the authors of Iglesias., [
42] conducted a review of different studies that used augmented and synthetic data, but the study’s scope was too broad and lacked an application-specific focus. On the other hand, the authors of Mirshekarian et al. [
43] compared real data with synthetic data for the prediction of blood glucose, but their study did not involve comparing machine learning models with state-of-the-art models, and the aim was different from the premises of this present study. Thus, the need for a study that directly compares augmented and synthetic image datasets using GANs in the context of medical disease prediction remains unaddressed in the existing literature. This research aims to fill this gap and provide insights that have been long-awaited in the medical research community.
To the best of our knowledge, no previous study has explored this particular aspect in the literature, although some studies, such as Gupta et al., and Mukherkjee et al. [
44,
45], have employed the use of GANs to increase data size for their classification task using existing machine learning models, without making any clear comparison between synthetic and augmentation techniques to correct the impression that they are the same, making this investigation an original and pioneering effort in the field of disease prediction. By adopting GANs, specifically the sophisticated DCGAN, to generate synthetic datasets of brain and kidney tumors and conduct a direct comparison with the augmented datasets, this research takes a unique approach to address the scarcity of data in medical research and to prevent the risk of an accidental breach and lack of preservation of the privacy of patients represented by datasets. This research also corrects the impression that image augmentation is the same thing as synthetic datasets, which is widely presumed by many researchers. This study also builds a novel C-DCNN model from CNN architectures for the classification of brain tumors and kidney tumors and then conducts a thorough comparison of the performance of our novel C-DCNN model with state-of-the-art models, such as ResNet50, VGG19, InceptionV3, and VGG16, when trained on augmented datasets versus synthetic datasets. The results and insights obtained from this study not only contribute to enhancing predictive accuracy in disease prognosis but also pave the way for further innovations and advancements in medical research using synthetic or augmented data.
4. Results
This study performed model training and evaluation on augmented datasets and the generated synthetic dataset using the Keras package and Python programming language in the Jupyter Notebook environment. The pre-processed data, totaling 40,000, were split into a training set comprising 80% and a test set comprising 20% for both synthetic and augmentation data. These sets were then fed into the novel C-DCNN model. The model’s performance was assessed by validating it on a randomly selected 20% subset of the training data and later validated on new sets of larger and more diverse datasets of over 18,200 images to confirm our findings. This approach ensured that the model was robust and generalized well to unseen data, highlighting the effectiveness of using both augmented and synthetic data in training. The model was trained for 50 epochs. We implemented the dropout regularization technique following the third max pooling layer and in the thick layers. Dropout regularization is a straightforward and user-friendly technique for regularization. By deactivating certain neurons during the training process, it generates a neural network that is both uncomplicated and effective. A simple neural network leads to lower complexity and, hence, mitigates overfitting. Two callbacks were implemented to enhance the training process, optimize model performance, and prevent overfitting. The first callback, EarlyStopping, monitors the validation loss and terminates the training process prematurely if the loss does not improve for a specified number of epochs. The second callback, ReduceLROnPlateau, reduces the learning rate when the validation loss does not improve for a certain number of epochs, with a patience value of three.
4.1. Validation Metrics
Performance evaluation metrics play a vital role in the development, testing, and deployment of machine learning models, enabling the creation of more precise and efficient AI solutions [
54]. They offer a means to objectively assess the model’s accuracy, precision, sensitivity, specificity, and other performance parameters. Performance evaluation metrics allow for the assessment of a model’s performance and provide comparisons across different models. Performance evaluation metrics are crucial for enhancing the transparency and interpretability of machine learning models, which is vital for establishing confidence in these systems. This study assessed the predictive efficacy of the novel C-DCNN model by employing performance evaluation metrics on both synthetic and augmented datasets. The results for both datasets demonstrate that the C-DCNN model achieved exceptional accuracy in detecting and classifying brain tumors and kidney tumors. The augmented dataset yielded a precision of 99% for both types of tumors, while the synthetic dataset achieved a precision of 99% for brain tumors and kidney tumors, as indicated in
Table 3. This model’s exceptional precision score suggests its ability to accurately detect brain and kidney tumors in the majority of instances. The recall score of 99% for brain and kidney tumors in both the synthetic and enhanced datasets indicates that this model accurately identified all positive instances. This demonstrates a very responsive model for both artificially generated and augmented datasets with significant therapeutic relevance in precisely identifying tumors in the brain and kidneys in MRI images. The specificity score of 99% for brain tumor and kidney tumors, obtained from both the synthetic and augmented datasets, indicates that the model accurately identified all negative cases. This suggests that the model has a high level of specificity and can effectively determine the absence of a tumor with accuracy. Moreover, an F1 score of 99% achieved by the model for brain tumor and kidney tumors, using both synthetic and augmented datasets, demonstrates its ability to effectively maintain a balance between precision and sensitivity. A high F1 score indicates a favorable equilibrium between precision and recall. Consequently, the model demonstrated an ability to accurately detect instances of brain and kidney tumors while effectively reducing the occurrence of false positive results. The accuracy score of 99% for augmented datasets and 99% for synthetic datasets demonstrates that the model can accurately categorize the generated synthetic and augmented MRI images of brain and kidney tumors and that there is no difference in model performance when augmented or synthetic datasets are used, which further encourages the used of synthetic datasets for classification tasks because of its enormous benefits, especially in increasing primary datasets and circumventing the hurdles of possible breach in data privacy. The C-DCNN model has exceptional efficacy in detecting and precisely categorizing brain and kidney tumors in MRI images. This has profound therapeutic consequences, especially in areas with restricted availability of specialized medical services and resources, as it enables prompt and accurate diagnoses. The C-DCNN model demonstrates the potential to be a powerful tool for detecting and categorizing brain and kidney tumors using CT images.
4.2. Confusion Matrix
A confusion matrix is a widely used statistical tool for assessing the performance of models. It aids in assessing the efficacy of our model in accurately categorizing all the images into two distinct classes. The matrix displays the numbers of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions generated by the model. A confusion matrix is crucial for assessing the accuracy and efficacy of models as it offers a concise visual depiction of the model’s performance [
55]. Through a thorough analysis of the matrix, one can identify the model’s strengths and weaknesses and implement appropriate enhancements to ensure more precise and dependable diagnoses. The C-DCNN model yielded excellent results, as seen by the confusion matrix. This model accurately identified 4028 MRI images as brain tumors (true positives) while incorrectly classifying 5 images as kidney tumors (false positives), as depicted in
Figure 6 for the synthetic datasets. Regarding these augmented datasets, the model accurately identified 4033 MRI images as brain tumors (true positives), while there were no instances where kidney tumors were mistakenly categorized as brain tumors (false positives). This outcome demonstrates that the model exhibits a high level of precision and dependability in the diagnosis of brain tumors. Furthermore, the model accurately classifies 3963 MRI images as kidney tumors (true negatives) and misclassifies 4 as brain tumor images (false negatives) for the synthetic datasets. Similarly, for the augmented dataset, this model correctly classifies 3966 MRI images as kidney tumors (true negatives) but misclassifies 1 as brain tumor images (false negatives), as depicted in
Figure 7. The confusion matrix obtained from the C-DCNN model’s detection and classification of MRI images holds great significance in medical diagnosis.
4.3. Learning Curve
Learning curves are a crucial tool for assessing the effectiveness of these models. It offers insights into the precision and error of a model throughout the training process, enabling the detection of possible performance concerns and providing guidance for enhancing the model. The learning curve of the model accuracy illustrates the progression of this model’s accuracy on both the training and validation datasets as it progresses over time. It can determine whether the model is exhibiting overfitting or underfitting with respect to the training data. An overfitting model exhibits a high level of accuracy when trained on a certain dataset, but its accuracy significantly decreases when tested on a different dataset, suggesting a need for improved generalization to handle new and unfamiliar data. A model that is underfitted can have a poor level of accuracy on both the training and validation data, suggesting that it needs more improvement in order to effectively determine the patterns within the data. By monitoring the learning curve of this model’s accuracy, it becomes feasible to determine the ideal number of epochs for training the model and ensuring that it is neither overfitting nor underfitting. The C-DCNN model demonstrated a robust fit and successfully captured the inherent patterns in the data without experiencing overfitting or underfitting. The progressive enhancement in training accuracy throughout the epochs signified the model’s acquisition and refinement of knowledge.
Figure 8 illustrates that the training process began gradually at epoch 0 and consistently improved, resulting in a training accuracy of approximately 99.81% at epochs 25/50 for the synthetic dataset and approximately 99.97% at epochs 23/50 for the augmented datasets. It is crucial to assess the model’s performance on the validation data, which consists of new and unseen data, in order to guarantee that the model is well-balanced with the training data. The training accuracy is equally significant in this evaluation. Like the training accuracy, the validation accuracy initially progresses slowly and encounters a minor inconsistency. Nevertheless, the test’s accuracy gradually rises and reaches a plateau at epoch 25 for synthetic datasets, with an accuracy of 99.89%. Similarly, for augmented datasets, it reaches a plateau at epoch 23 with a test accuracy of 99.99%. This suggests that additional training is unlikely to enhance the model’s performance on fresh data. The model loss learning curve depicts the variation in the loss function of a model during its training across numerous epochs. The training loss diminishes progressively as the model acquires a better match to the data, indicating positive progress. The initial training loss for synthetic datasets was 0.3383 at epoch 1, and it consistently reduced to 0.0065 by epoch 25. Similarly, for augmented datasets, the training loss started at 0.2408 at epoch 1 and steadily decreased to 0.0017 at epoch 23. This suggests that the model enhances its capacity to precisely predict the desired outcome with a reduced margin of error. By contrast, the validation loss exhibited a continuous downward trend from epochs 1 to 17, followed by variability between epochs 18 and 24, and finally, stabilizing at epoch 25 without any consistent drop in the synthetic dataset. The validation loss for the augmented datasets exhibited a steady decline from epochs 1 to 13, followed by fluctuation between epochs 14 and 20, and then stabilizing at epoch 21. From epoch 21 onwards, the validation loss continuously reduced until it reached a plateau at epoch 23. The validation loss quantified the discrepancy between the predicted and actual outputs on an unseen dataset used for evaluating the model’s performance, which was not used for training. Therefore, it calculated the model’s performance using previously unseen data. An optimal fit was attained when the model’s training and validation losses reached a steady state after multiple epochs. Overfitting is indicated by low training loss and high validation loss, but both losses being large may suggest underfitting. Hence, it is imperative to assess the model’s performance and determine the necessary enhancements by scrutinizing the learning curves of model accuracy and model loss.
4.4. Receiver Operating Characteristic (ROC) Curve
The area under the ROC curve (AUC ROC) is a crucial assessment metric for classification jobs. It offers a visual depiction of the diagnostic accuracy of a classification model. This statistic provides a comprehensive evaluation of a classifier’s performance across all potential classification criteria. The AUC ROC evaluates the sensitivity (true positive rate) and specificity (true negative rate). It offers a singular numerical value that succinctly represents the classifier’s capacity to differentiate brain tumors from kidney tumors. A higher AUC ROC indicates superior model performance. The Receiver Operating Characteristic (ROC) curve is a graphical representation of a binary classification model’s performance as the threshold for discrimination is adjusted. On the other hand, the area under the curve (AUC) is a numerical measure that measures the overall performance of a binary classification model based on its ROC curve. The AUC value measures the overall ability of the model to distinguish between different classes, where a value of 1 indicates perfect classification, and 0.5 suggests random chance [
56]. An AUC ROC curve of 1.00 for both synthetic and augmented datasets, as depicted in
Figure 9, indicates that the C-DCNN model possesses a strong ability to distinguish between brain tumors and kidney tumors. This indicates that the model has the ability to accurately distinguish between true positive instances (MRI images correctly identified) and false positive cases (MRI images wrongly classified) with a high degree of precision. This model demonstrates outstanding performance in accurately diagnosing brain tumors and kidney tumors, as evidenced by its high AUC value. This indicates that the C-DCNN model has the capacity to offer dependable and precise diagnostic assistance for oncologists.
4.5. Comparative Evaluation of C-DCNN Model with State-of-the-Art Advanced Deep Learning Models
By comparing our novel C-DCNN model with other sophisticated deep learning models, like ResNet50, VGG16, VGG19, and InceptionV3, we provide a standard for evaluating its potential to generalize, contribute to progress in the field, and assist in making practical decisions. These models are renowned and extensively utilized transfer learning models in the field of computer vision, exhibiting exceptional performance and making major contributions to tasks such as image recognition and classification [
57,
58]. Through a comparative analysis of the C-DCNN model with these established models, we can accurately evaluate its performance, competitiveness, and potential superiority. This comparison aims to situate the unique C-DCNN model in relation to current cutting-edge methods and validate its credibility and significance in the field of computer vision. Jupyter Notebooks were utilized for the implementation of all cutting-edge transfer learning models, just like the C-DCNN model. The fine-tuning approach implemented in our innovative C-DCNN model to mitigate the issues of overfitting and underfitting was conducted using all the state-of-the-art models utilized in our comparative evaluation work.
ResNet50 is a CNN architecture renowned for its use as a profound residual learning methodology. It tackles the issue of diminishing gradients in highly complex networks, enabling the training of exceptionally deep models. It has achieved success in many image classification tasks and is widely recognized for its capability to capture intricate features from CT and MRI images. VGG16 and VGG19 are CNN conceptions that were created by the Visual Geometry Group (VGG) at the University of Oxford. These models exhibit a consistent structure, including numerous layered convolutional and fully linked layers. VGG16 and VGG19 are renowned for their outstanding and unique performances in large-scale image categorization tasks, demonstrating excellent accuracy as a result of their profound and detailed feature extraction capabilities. The InceptionV3, sometimes referred to as GoogleNet, pioneered the idea of inception modules, which effectively capture multi-scale information by utilizing parallel convolutions at various spatial resolutions. This architectural design minimizes computing complexity while still ensuring a high level of accuracy. InceptionV3 has been extensively utilized in many image recognition applications and has exhibited exceptional performance in object detection and localization.
The comparative evaluation of the novel C-DCNN model with state-of-the-art advanced deep learning models demonstrates differences in performance and offers distinct clinical implications. The C-DCNN model demonstrates superior performance compared to other models, exhibiting enhanced predictive capabilities. The C-DCNN model outperforms VGG16, InceptionV3, VGG19, and ResNet50 in terms of accuracy, precision, sensitivity, specificity, F1 score, AUC, and loss. The model attains 99% accuracy for both synthetic data and augmented datasets, outperforming the other models by a small margin, as demonstrated in
Table 4. The C-DCNN model exhibits superior precision, sensitivity, and specificity compared to ResNet50, VGG16, VGG19, and InceptionV3. An F1 score of 99% for both synthetic and augmented data demonstrates an exceptional equilibrium between precision and sensitivity. In addition, the C-DCNN model demonstrates a perfect AUC value of 1.00 for both datasets, demonstrating exceptional discriminative capability. Furthermore, it exhibits a smaller loss value, suggesting superior optimization and a reduced number of errors. The enhanced precision of the C-DCNN model has notable clinical importance. The C-DCNN model achieves a perfect classification accuracy of 99% for identifying brain and kidney tumors in MRI images, ensuring dependable and accurate outcomes. Oncologists and healthcare professionals engaged in the diagnosis and treatment of these tumors can considerably benefit from this exceptional level of accuracy. It decreases the likelihood of misdiagnosis, facilitating prompt identification and suitable response, hence enhancing patient outcomes. The C-DCNN model surpasses previous models in multiple criteria, making it a more reliable and precise tool for aiding oncologists in crucial decision making.
The exceptional efficacy of the novel C-DCNN model can be ascribed to various variables, including its distinctive architectural configuration, efficient training methodology, and enhanced capacity to acquire and depict the pertinent characteristics in brain and kidney tumors. The inclusion of particular design elements, such as the Mixed-Scale Dense Convolution Layer, Self-Attention Mechanism, Hierarchical Feature Fusion, and Attention-Based Contextual Information, allowed the C-DCNN model to more efficiently capture and extract pertinent features for brain tumor classification. The C-DCNN model was developed utilizing an optimized configuration and successful training methodologies, including the meticulous selection of hyperparameters, such as the learning rate, batch size, and regularization approaches. These combinations expedite the process of convergence and enhance the model’s ability to discover a more optimal solution. Although ResNet50 is a complex model renowned for its deep architecture and skip connections, it demonstrated the lowest performance when compared to other models. The performance metrics indicate that ResNet50 achieved an accuracy of 93.67% for synthetic data and 92.59% for augmented data, a precision of 92.6%, and an F1 score of 92.6% for augmented datasets. However, these results reveal that ResNet50 had a relatively larger number of false positives and false negatives, leading to less-than-optimum predictions. This outcome may arise because of the distinctive attribute of the utilized data and the intricate structure of the ResNet50 model. The InceptionV3, also known as the GoogleNet model, exhibited somewhat inferior performance compared to VGG19, which is an enhanced version of the VGG16 model. This is evident in its lower sensitivity, specificity, F1 score, true positive rate (TP), area under the curve (AUC), and accuracy. This discrepancy can potentially be ascribed to reasons such as the heightened intricacy of the InceptionV3 and VGG19 models, resulting in an inadequate depiction of the particular features pertinent to the categorization tasks.
4.6. Expert Assessment of DCGAN-Generated Synthetic Data
To evaluate the synthesized data of brain and kidney tumors, we tasked two experienced radiologists with categorizing real, augmented, and synthetic images for both datasets. This experiment aimed to determine whether radiologists could distinguish between the real images and synthetic images generated by the DCGAN. Consistent classification outcomes would indicate the utility of the artificial data for training machine learning models.
Original brain tumor images (MRI);
Augmented brain tumor images (MRI);
Synthetic brain tumor images generated by the DCGAN;
Original kidney tumor images (MRI);
Augmented kidney tumor images (MRI);
Synthetic kidney tumor images generated by the DCGAN.
Task: The radiologists were asked to classify each image as either real (original/augmented) or synthetic.
Scoring: The accuracy of their classifications was recorded, with a focus on the difference in performance between real and synthetic images.
Results:
Expert 1: accurately recognized both real and synthetic brain and kidney tumors with a 20% success rate.
Expert 2: accurately classified the real, augmented, and synthetic brain and kidney images with a 25% success rate for each category.
Both experts achieved comparable classification performances for real, augmented, and synthetic images. This indicates that the synthetic images generated using the DCGAN have significant similarity to real and augmented images, thus validating their potential utility for training machine learning models.
Sample Images: To provide further clarity on the quality of the synthetic images, we include sample images in
Figure 10.
The comparable performance of the radiologists in categorizing real, augmented, and generated images confirmed the efficacy of the DCGAN in producing realistic synthetic images. The DCGAN not only generated new images but also provided notable benefits, such as the ability to generate synthetic images with diverse variations, enhance image qualities, and streamline the training process by eliminating artifacts. These images can be confidently used for training machine learning models, enhancing the diversity and robustness of the datasets to enhance the accuracy of model performance.
4.7. Data Quality Assessment Using MSE
To evaluate the quality of synthetic images generated by the DCGAN, we further calculated the Mean Squared Error (MSE) between the original and synthetic images. MSE measures the average squared difference between the original and synthetic images, providing a quantitative assessment of image quality and thereby providing insights into the effectiveness of the DCGAN in generating high-quality synthetic data.
To calculate the MSE between the original and synthetic images, we followed these steps:
Step 1: Preprocess the images to ensure they are in the same format and size.
Step 2: Compute the pixel-wise differences between the original and synthetic images.
Step 3: Square the differences.
Step 4: Compute the mean of the squared differences.
The formula for MSE is as follows:
where
is the pixel value of the original image,
is the pixel value of the synthetic image, and
n is the total number of pixels when the implementation of the MSE was computed in the Python environment using NumPy version: 1.24.3.
The MSE was computed for a representative sample of original and synthetic images from the brain tumor and kidney tumor datasets. With a pixel value in the range [0, 255], the results showed an average MSE of 0.166, indicating that the synthetic images closely resembled the original images. These findings support the effectiveness of the DCGAN in generating high-quality synthetic data that can be used for model training and evaluation.
5. Conclusions
In conclusion, this study introduces a technique that emphasizes the medical importance of deep learning models, particularly the novel C-DCNN model, in the detection and diagnosis of brain tumors when fed with MRI images generated by GANs, specifically the sophisticated DCGAN and images augmented through traditional techniques, such as manipulating the training set by changing its geometric and color space properties (such as rotation, scaling, cropping, brightness, zooming, and contrast) to enhance classification accuracy, and comparing the performances of state-of-the-art deep learning models, such as ResNet50, VGG16, VGG19, and InceptionV3 when trained on augmented and synthetic datasets. We applied this methodology to brain and kidney tumor images. The C-DCNN model, built entirely from scratch using CNN architectures, demonstrated exceptional accuracy, precision, sensitivity, specificity, and F1 score, underscoring its potential as an efficient tool for swift and precise diagnosis. The novel C-DCNN model demonstrated its performance efficiency by effectively classifying between brain and kidney tumors when trained with augmented and synthetic datasets.
The primary discovery of this study is that synthetic datasets created using GANs can achieve comparable performance to augmented datasets when utilized for training deep learning models. This is consistent with studies such as [
23,
59,
60] that have recorded improved accuracy utilizing synthetic datasets generated using GANs. This demonstrates the considerable potential of synthetic datasets to be widely used as an established augmentation technique for deep learning in industrial applications. These applications typically feature small-sized datasets and offer many advantages, such as privacy preservation. The proposed DCGAN (Deep Convolutional Generative Adversarial Network) for generating synthetic data provides a considerable advantage by reducing the time and opportunity costs associated with collecting real-world data. The results of this work enhance the continuous progress in AI-driven cancer diagnosis and highlight the capability of deep learning models to enhance diagnostic precision and patient care, even when trained on synthetic datasets.
Despite the promising findings, several limitations warrant our acknowledgment. Firstly, our study primarily compared the novel C-DCNN model with state-of-the-art models, such as ResNet50, VGG16, VGG19, and InceptionV3. While these models are well-regarded in the literature, there may be other models with superior performance that we did not consider. Future studies should include a broader range of sophisticated models to ensure a comprehensive comparison with our novel C-DCNN model. Secondly, our study utilized the DCGAN to generate synthetic images. While the DCGAN is effective, there are more advanced models available that might produce higher-quality synthetic images. Future research should explore and test other GAN architectures to determine if they offer improved image-generation capabilities. Future research could incorporate additional clinical factors and expert insights to refine the evaluation and selection criteria for these models. This would provide a more holistic approach to the model assessment and potentially improve performance. To enhance the applicability and generalizability of our findings, future studies should expand the scope of our analysis to include a broader range of imaging modalities and cancer types. This can enable the model to be applicable in diverse clinical scenarios and improve its robustness. Finally, our study focuses on binary classification. Extending the research to multi-class classification problems is an important area for future work. This would allow the model to handle more complex scenarios and provide more detailed diagnostic information.
Incorporating synthetic data and the C-DCNN model into radiological workflows holds significant promise for improving diagnostic accuracy, enhancing training, and supporting clinical decision making. However, addressing the outlined limitations is essential for successful integration. Future research should focus on validating these technologies, gaining acceptance among medical professionals, ensuring technical compatibility, maintaining data privacy, and establishing frameworks for continuous learning.
Our future plans involve expanding our work to include additional medical fields that can benefit from synthetic data produced using GANs. This expansion can enhance training efforts and lead to improved classification outcomes across various medical disciplines. By leveraging synthetic data, we aim to address data scarcity issues and improve the robustness and accuracy of diagnostic models.