1. Introduction
In the realm of computer vision, one of the most sought-after capabilities is the effective classification of image data. As the availability of image-capture devices and the increasing adoption of digital platforms drive the exponential growth of digital data, there is an urgent need for robust and sophisticated models to make sense of this massive influx of visual information. The utilisation of deep learning in image classification has been suggested due to its capacity to provide a more profound comprehension of the response of a subject to specific visual stimuli upon exposure. According to [
1], deep learning-based techniques have produced amazing results recently. The efficacy of a classification system is contingent upon the quality of the features extracted from an image. The accuracy of the results will increase proportionally with the quality of the extracted features. Despite the significant advancements demonstrated by numerous deep learning-based approaches in image classification, the extraction of all crucial information from images remains a challenge for these methods. As a consequence, there is a decrease in the overall accuracy of classification. Image classification, object identification and recognition from images, and image segmentation activities can be generically classed as computer vision (CV) tasks. One of the most prevalent CV issues is image classification. These are frequently utilised, particularly in the medical field, and are sometimes stated to be supervised learning, where a group of features
X (typically taken from the image) are used to forecast a specific result
y, or label. Before the use of deep learning became widespread in 2012, some common machine learning models included support vector machines, random forests, and artificial neural networks and these traditional approaches were the most common way of handling computer vision tasks including classification, object detection and tracking [
2].
Deep Convolutional Neural Networks (DCNNs) are advanced deep learning models used for image and video processing. Convolutional layers, unlike completely connected layers, only link neurons to a limited input region. This localised filtering method creates a feature map from input data. The max-pooling layer, which takes the maximum value from a segment of the image, is often used after these convolutional layers to down-sample the data’s spatial dimensions. DCNNs support image categorization, object detection, semantic segmentation, face recognition, and artistic style transfer. In 2014, Oxford University’s Visual Geometry Group (VGG) created VGGNet [
3]. VGGN’s architectural simplicity is beneficial. However, it had three times more parameters than AlexNet. VGG architecture supports advanced object identification models. On many tasks and datasets outside ImageNet, the Deep Neural Network VGG outperforms baselines. However, one of the most popular image recognition architectures remains. AlexNet, VGG, ResNet, and Inception are among the DCNN designs that have improved efficiency and performance using new structures and concepts. In those battling with COVID-19, respiratory symptoms are one of the most common symptoms, and they may be detected via chest X-ray imaging. Additionally, a condition with modest symptoms may be diagnosed using chest CT scans. Normally, detection is accomplished by analysing signal data [
4]. The existing deep learning models require a higher number of training parameters, which not only increases the computational complexity of classification but also leads to over-fitting issues due to the scarcity of COVID-19 X-ray images [
5]. Recent years have shown that deep learning models are a promising tool in the field of medicine for the diagnosis of pathologies, including lung pathologies, and have also shown highly promising results in the diagnosis of other medical disorders. There are many deep learning models which have been developed and presented to diagnose the presence of COVID-19 and pneumonia in chest X-ray and computerised tomography (CT) scan pictures using various deep learning models and techniques.
The Generative Adversarial Network (GAN) is a type of generative modelling which uses deep learning techniques in convolutional neural networks. Generative modelling is an unsupervised learning task that involves automatically detecting and learning patterns in input data such that the model could be used to produce new instances that might have been drawn from the original dataset. GAN models were proposed by [
6] and have been widely used in the area of image processing for translating an input picture into its matching output image. They pit the Generator against the Discriminator. Starting from random noise, the Generator iteratively creates real-looking data to trick the Discriminator. Conversely, the Discriminator must identify real data from fakes. GANs are a delicate dance which the Generator improves its counterfeits and the Discriminator improves its discernment. The Generator should create data that are so realistic that the Discriminator can no longer distinguish it from real samples as their training advances. GANs may generate lifelike representations of faces and objects, transfer styles between images, enhance data sets, especially when authentic samples are scarce, improve image resolutions, and even create molecular designs for potential medications. GANs have drawbacks. Mode collapse—when the Generator produces limited or repeating outputs—can be problematic. Training might be unpredictable due to the dynamic tension between the two networks, requiring cautious balancing. Traditional GANs have trouble in precisely guiding the Generator’s output. However, DCGANs, WGANs, and GANs have improved stability, quality, and controllability, stretching GANs’ limits.
The coronavirus (COVID-19) outbreak is a severe repository disease which was first identified in Wuhan, China, at the end of 2019. In this situation, the main priority is creating more efficient and faster approaches for diagnosis approaches to reduce the transmission of a severe acute respiratory syndrome such as COVID-19. According to the World Health Organization (WHO), it is reported that this corresponds to just under 9.5 million new cases and over 41,000 new deaths at end of 2019. As of 2 January 2022, a total of nearly 289 million cases and just over 5.4 million deaths have been reported globally. We now have several vaccine options, but it would take a long time for the vaccine to reach every region of the world. As a result, visual markers can be employed as an alternate way quickly screen infected individuals. The typical common symptom of the virus is a lung infection, for which chest radiography pictures, like X-ray and computed tomography (CT) images, are extensively used as a visual signal [
7]. Reverse Transcription Polymerase Chain Reaction (RT-PCR) is frequently utilised for COVID-19 detection. Expert laboratory personnel and testing equipment are necessary for testing. The time and costs associated with testing a sample vary from two hours to several days. In addition, RT-PCR produces inaccurate results in some circumstances due to its high false-negative rate (39–60 per cent) [
8]. New variations of the SARS-CoV-2 virus have made it more difficult to identify using current diagnostic methods. Conventionally, radiologists use chest X-ray images to interpret the images to discover some visual patterns which can confirm a COVID-19 infection. While this approach has become more accurate with time, it is still disposed to medical staff risk. It is also more expensive because diagnostic test kits are required for each patient. In comparison, medical imaging procedures such as X-rays and CT scans, which are considerably quicker, safer, and more widely available, can be employed for screening. For COVID-19 screening, X-ray image screening is preferable over CT scans since it is more widely available and less expensive [
9]. Still, manually diagnosing the virus using X-ray images might be time-consuming. If there is no or little prior experience, it can lead to uncertain events such as inaccuracies and human errors. As a result, there is a robust necessity to broadly automate such operations, and they should be available to everyone so that diagnosis may become more efficient, accurate, and rapid.
To address the aforementioned challenges, researchers and practitioners have developed an automated detection system for coronavirus infections using artificial intelligence (AI) techniques [
10]. In the past decade, the combination of AI with medical imaging has aided several industries, including healthcare, in diagnosing and treating a variety of conditions. Deep Neural Networks [
11] are being utilised in the healthcare sector as of late. The Convolutional Neural Network (CNN) is a well-known Deep Neural Network. Deep learning models were successfully applied in a variety of fields, including medical data segmentation, classification, and lesion identification [
12]. In battling with COVID-19, respiratory symptoms are one of the most common COVID-19 symptoms, and they may be detected through chest X-ray imaging. A condition with modest symptoms may also be diagnosed using chest CT scans. Typically, detection is accomplished by analysing indicator data (Mohsin Ahmed and Wael Abdullah, 2021). For the CNN to detect a coronavirus infection from chest X-ray images, massive amounts of training data are required. However, adequate chest X-ray image datasets with equal COVID-19 and normal chest images are unavailable. The absence of supervised data may contribute to the class imbalance issue [
13]. As a result, there is a strong need to broadly automate such operations, and they should be available to everyone so that diagnosis may become more efficient, accurate, and rapid. Imaging is the fastest and most accurate way to detect COVID-19. Researchers use X-ray images for COVID-19 detection because of their benefits. Low cost and wide availability are its main advantages over other imaging methods. Furthermore, X-ray imaging uses less radiation than CT scan imaging. It detects lung cancer and cardiac diseases. X-ray images are widely used, especially in poor countries. CT scans are better than X-rays [
14]. CT scans provide more accurate diagnoses. CT scans are expensive and expose patients to more radiation. CT and X-ray images are popular for COVID-19 identification. Ground-glass opacification in the upper right lung is seen in X-rays. CT scans use ground-glass areas in the lower lung and halo signs and consolidation areas in the lower lobes [
15,
16,
17,
18]. X-ray and CT imaging features of COVID-19 and non-COVID cases are shown in
Figure 1 [
19].
One deep learning model may not classify X-ray images well. Machine learning and ensemble learning improve classification and regression predictions. Researchers combine deep learning models to improve results, but having too few datasets, especially images, can cause overfitting. To extract patterns, training requires many of parameters, which is computationally expensive. This requires tricky hyperparameter tuning [
5]. In dealing with disease diagnosis, Hyperspectral imaging (HSI) captures and processes broad electromagnetic spectrum data. Hyperspectral cameras can capture images in dozens or hundreds of spectral bands, unlike traditional cameras that only capture red, green, and blue. One wavelength range of the electromagnetic spectrum is represented by each band [
20]. Deep Convolutional Neural Networks (DCNNs) and Generative Adversarial Networks (GANs) have emerged as two of the most prominent and transformative deep learning techniques for image processing tasks. While DCNNs have been widely used for image classification due to their ability to learn the spatial hierarchies of features automatically and adaptively, GANs have gained popularity due to their ability to generate new data samples that are coherent and often indistinguishable from real data. The combination of these two robust architectures offers a novel method for improving the performance and robustness of image classification tasks. An integrated model that leverages the strengths of both GANs and DCNNs has the potential to overcome the challenges encountered when using them separately. GANs, for example, can be used to augment the training dataset, addressing issues such as data scarcity or class imbalance, while DCNNs can use this enriched dataset to perform more accurate classifications. There are a lot of practitioners and researchers who combine several deep learning models in COVID-19 X-ray images problem to obtain better results in terms of reducing the variation as well, such as, the authors [
21] who proposed a DCGAN-based CNN model that generates synthetic CXR images using different datasets as references; the authors of [
5] who examined the novel attention-based deep learning model using the attention module with VGG16 by considering Computer-aided diagnosis (CAD), and the authors of [
9] who developed the automatic detection of COVID-19 cases using a combination of GANs and Deep Convolutional Neural Networks models. Likewise, the authors of [
22] improved COVID-19 detection using GAN-based data augmentation and a novel QuNet-based classification. This study delves into the integration of GANs and DCNNs, to determine whether such a fusion can improve image data classification performance, both qualitatively and quantitatively. The investigation of methodologies, challenges, and breakthroughs in this domain will provide a comprehensive understanding of the benefits and drawbacks of combining these two cutting-edge technologies. In this study, the authors proposed the development of GAN and VGG16 models to generate synthetic datasets of COVID-19 diseases and analysing the X-ray images, respectively.
This section provides some background on the emergence of deep learning approaches related to COVID-19 issues. In
Section 2, a review of the theoretical preliminaries of GAN and VGG16 is discussed, including the confusion matrix for the evaluation of the development of deep learning models.
Section 3 illustrates the development of the proposed integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for image data classification.
Section 4 discusses the application of the proposed integrated deep learning model on the COVID-19 image dataset and presents an evaluative analysis, with some important remarks being discussed. Finally,
Section 5 concludes this paper.
4. Results and Discussion
This section discusses and interprets the results of developing an integrated GAN and VGG16 and its application in X-ray images for COVID-19 detection. As mentioned earlier, the details of those points are extensively discussed in this section.
In data preparation, the input data, which currently have a range of [0, 1], are scaled to have a range of [−1, 1]. The purpose of this rescaling is that, the utilisation of the tanh activation function in the generator output often produces better results. Using a sigmoid activation function in the generator output is also prominent, which would not require rescaling the image further. This is due to the mathematical representation of the maps input values in an output range between 0 and 1. Rescaling in terms of resolution typically involves resizing the image and changing its width and height, which is a different process not directly related to the activation functions in neural networks. The resolution being used in this study is set to 64 × 64 pixels. This prevents the model from learning noise and specific orientations, making it more robust. One-hot encoding is performed on the labels in the normalisation process in order to create the label or dependent attribute for labelling images as either COVID or normal. One-hot encoding is the representation of categorical variables as binary vectors. Label Encodings convert labels or words into numeric form. Then, the data are partitioned into training and testing sets of 80 per cent and 20 per cent, respectively. The implementation of the scaling (augmentation) and splitting processes is crucial for assessing how well the model generalized new, unseen data, which is a key indicator of whether overfitting has occurred.
Once the data preparation has been completed, the development of the GAN model will start with the generator. The standard VGG16 architecture is designed for three channel RGB images. However, it can be modified to accept one channel images by adjusting the input layer. This modification is sometimes made when dealing exclusively with grayscale images. For the COVID dataset, the author starts with a 100-node latent vector, then reshapes it to 8 × 8 × 69 before connecting it to the Dense layer’s 4416 nodes. The data are then up-sampled to an output size of 64 × 64 by passing it via Transposed Convolutions. As we limited the number of filters from 207 to just 3, which represent the various colour channels, observe that we also employed ordinary convolution in the output layer. The process is similar for the normal dataset, it starts with a 100-node latent vector and reshapes to 8 × 8 × 25 before connecting it to the Dense layer’s 1600 nodes. The same goes for the COVID dataset, the data are up-sampled to an output size of 64 × 64 by passing it via Transpose Convolutions and limiting number of filters from 75 to 3. The generator and discriminator will be combined where the weight of the discriminator is not being trained for both datasets. They will be combined and compiled with the parameter setting loss function: binary_crossentropy, optimizer: Adam with learning rate: 0.0002. The combination of generator and discriminator models will become trainable GAN as depicted in
Figure 4.
In contrast to the generator model, the discriminator model performs the opposite function. In other words, a 64 × 64 image is passed through many convolutional layers to provide a real or fake binary classification output. Then, the two models are combined to generate a Deep Convolutional GAN. The author makes the discriminator model non-trainable. This is carried out because the author needs to train the discriminator using a combination of real and created data.
To assist with the sampling and generation of data for the two models, the author develops three straightforward functions. In the first, real images are taken as samples from the training data, in the second, random vectors are drawn from the latent space, and in the third, latent variables are fed into a generator model to produce generated fake examples. Fake (generated) images are created once the models have been trained on the COVID and normal datasets, as shown in
Figure 5 and
Figure 6, and they are saved in a folder created for the further computational process.
The GAN models we developed are evaluated using accuracy and other measurements from the confusion matrix.
Figure 5 and
Figure 6 show some fake (generated) images created after training the model for 2000 epochs on the COVID and normal datasets.
The authors conducted a comparison study to predict COVID-19 by using the established VGG16 CNN model and integrated VGG16 CNN and deep convolutional GAN (VGG16 CNN + GAN) model. Ideally, the integrated VGG16 CNN + GAN would use the dataset to combine the real data and synthetic (generated) data from the GAN phase. From the original dataset, the COVID and normal datasets have 69 and 25 images, respectively. As the synthetics (generated) datasets have been developed by a deep convolutional GAN model for both datasets, the COVID and normal datasets become 117 and 108, respectively. The imbalance issue has been solved in conducting the classification model by creating the synthetic images. From the inputs, the VGG16 CNN + GAN model is developed with the same setting in hyperparameter tuning as the original VGG16 CNN model. The input mentioned uses VGG16 CNN + GAN with 224 × 224 pixel, Max-Pooling size is 2 × 2 and stride = 2.
Figure 7 presents the images predicted by the proposed VGG16 CNN + GAN training data.
While running the code, when the epoch reaches 10, the validation loss is the lowest, and the training is stopped at this time. The author set an early stop during training to prevent the deterioration of the model’s generalization performance caused by continued training. Primarily, this choice could be justified if the model shows early convergence, achieving satisfactory performance within the first 10 epochs with no significant improvement in key metrics like accuracy or loss thereafter. GANs and CNNs may exhibit rapid convergence, particularly when the model designs are optimised, and the dataset is highly compatible with the objective. If the models exhibited satisfactory performance within 10 epochs, this could serve as a legitimate justification for implementing early stopping. This early stopping is crucial in preventing overfitting, where the model performs well on training data but poorly on unseen data, a common risk in prolonged training. Additionally, stopping at 10 epochs can be a strategic choice for resource optimization, saving computational time and power, particularly important in resource-intensive deep learning tasks.
Table 1 depicts the evaluation results using the confusion matrix for the proposed integrated deep leaning model (VGG16 CNN + GAN) and established VGG16 CNN. The accuracy, sensitivity, specificity, and F1-Score for the prediction of a test set by the proposed integrated deep learning model shows the results of 0.9655, 1, 0.8750, 0.96, respectively. The COVID-19 samples from a large chest X-ray dataset from GAN were fed into the proposed deep learning model, each with a same selection settings and hidden layers as the original VGG16 CNN. Additionally, the author selects model’s greatest performance on its own, and then assessed both models against the test set. The proposed VGG16 CNN + GAN model presents better results with perfect classification metrics, as presented in
Table 1, compared to VGG16 CNN with accuracy, sensitivity, specificity, and F1-scores of 0.9474, 0.9286, 1 and 0.95, respectively. It shows that the proposed integrated deep learning model is promising. From the results gathered, the proposed integrated deep learning model (VGG16 CNN + GAN) solves problems like lack of data and uneven distribution by adding more and different types of data to the dataset. This makes the model more accurate, applicable, and strong. The method helps make diagnoses more accurate, works with new cases, can be scaled up, and does not cost a lot of money. This makes it especially useful in medical research and diagnosis.
The model is additionally assessed on test sets that were not exposed to the model during the training process. The proposed VGG16 CNN + GAN, with 224 × 224 pixel, Max-Pooling size of 4 × 4 and stride = 2, provides as AUC (Area Under the Curve) for COVID-19 of 0.98, indicating that the model can accurately differentiate between COVID-19 and normal instances for these conditions. However, for the established model, the AUC value is 0.95. This demonstrates that the proposed model exhibits greater consistency and resilience across all classes, even in the presence of non-uniform sample distribution.
Figure 8 depicts the AUC scores for both models’ comparison.
Figure 9 depicts the application of Grad-CAM on a trained model to visualise the impact of COVID-19 on an infected individual. The original COVID-19 image (a) and the class activation map for COVID-19 (b) are shown to highlight the areas of interest in our model’s prediction. These areas are represented by high-intensity visuals in blue and green. The utilisation of Grad-CAM in this study amplifies the interpretability and explanatory capacity of the proposed deep learning model.
The results indicate that the patient diagnosed with COVID-10 is more likely to receive a False Positive result when tested using the proposed model. Hence, in order to achieve precise identification of COVID-19 cases with improved recall, it is recommended to train the model using radiology images that exhibit symptoms of COVID-19. This will enable us to accurately identify COVID-19 patients who were previously misdiagnosed as False Positives. This leads to an impartial identification of COVID-19 cases in a live situation.
The authors were also concerned regarding the generalisation issue. The ability of the model to adapt and appropriately respond to novel, previously unobserved data obtained from the same distribution as that used to develop the model is referred to as generalisation. Generalisation, in other words, evaluates how effectively a model can take in new data and produce accurate predictions after being trained on a training set. The success of a model depends on how effectively it can generalise. A model cannot generalise if it is trained on training data too thoroughly. When presented with new evidence in such circumstances, it will ultimately make incorrect predictions. Even though the model is capable of producing accurate predictions for the training data set, this would render it useless. Overfitting is the term for this. The proposed integrated VGG16 CNN and deep convolutional GAN provide almost 97 per cent accuracy, it may be the overfitting issue here. Some more activities can reduce the overfitting and make the model robust such as adding more data, use data augmentation, using architectures that generalize well, add regularization (mostly dropout, L1/L2 regularization are also possible) and reducing architecture complexity.