Enhanced Gravitational Search Optimization with Hybrid Deep Learning Model for COVID-19 Diagnosis on Epidemiology Data

Ragab, Mahmoud; Choudhry, Hani; H. Asseri, Amer; Binyamin, Sami Saeed; Al-Rabia, Mohammed W.

doi:10.3390/healthcare10071339

Open AccessArticle

Enhanced Gravitational Search Optimization with Hybrid Deep Learning Model for COVID-19 Diagnosis on Epidemiology Data

by

Mahmoud Ragab

^1,2,3,*

,

Hani Choudhry

^2,4

,

Amer H. Asseri

^2,4,

Sami Saeed Binyamin

⁵

and

Mohammed W. Al-Rabia

^6,7

¹

Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

²

Centre for Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah 21589, Saudi Arabia

³

Mathematics Department, Faculty of Science, Al-Azhar University, Nasr City, Cairo 11884, Egypt

⁴

Biochemistry Department, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia

⁵

Computer and Information Technology Department, The Applied College, King Abdulaziz University, Jeddah 21589, Saudi Arabia

⁶

Department of Medical Microbiology and Parasitolog, Faculty of Medicine, King Abdulaziz University, Jeddah 21589, Saudi Arabia

⁷

Health Promotion Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Healthcare 2022, 10(7), 1339; https://doi.org/10.3390/healthcare10071339

Submission received: 25 May 2022 / Revised: 13 July 2022 / Accepted: 14 July 2022 / Published: 19 July 2022

(This article belongs to the Special Issue Clinical Decision-Making Processes in COVID-19 Pandemic: Changes and Effects)

Download

Browse Figures

Versions Notes

Abstract

:

Effective screening provides efficient and quick diagnoses of COVID-19 and could alleviate related problems in the health care system. A prediction model that combines multiple features to assess contamination risks was established in the hope of supporting healthcare workers worldwide in triaging patients, particularly in situations with limited health care resources. Furthermore, a lack of diagnosis kits and asymptomatic cases can lead to missed or delayed diagnoses, exposing visitors, medical staff, and patients to 2019-nCoV contamination. Non-clinical techniques including data mining, expert systems, machine learning, and other artificial intelligence technologies have a crucial role to play in containment and diagnosis in the COVID-19 outbreak. This study developed Enhanced Gravitational Search Optimization with a Hybrid Deep Learning Model (EGSO-HDLM) for COVID-19 diagnoses using epidemiology data. The major aim of designing the EGSO-HDLM model was the identification and classification of COVID-19 using epidemiology data. In order to examine the epidemiology data, the EGSO-HDLM model employed a hybrid convolutional neural network with a gated recurrent unit based fusion (HCNN-GRUF) model. In addition, the hyperparameter optimization of the HCNN-GRUF model was improved by the use of the EGSO algorithm, which was derived by including the concepts of cat map and the traditional GSO algorithm. The design of the EGSO algorithm helps in reducing the ergodic problem, avoiding premature convergence, and enhancing algorithm efficiency. To demonstrate the better performance of the EGSO-HDLM model, experimental validation on a benchmark dataset was performed. The simulation results ensured the enhanced performance of the EGSO-HDLM model over recent approaches.

Keywords:

COVID-19; disease detection; health promotion; fusion model; hybrid deep learning; epidemiology data; parameter optimization

1. Introduction

COVID-19 or Coronavirus is a recent disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. In February 2021, over 106 billion SARS-CoV-2 infections and a mortality rate of above 2.3 billion were recorded globally, in the worst epidemic to have troubled humans since the Spanish flu occurred in 1918, and has consequently overwhelmed global healthcare systems resulting in serious economic disturbance [2]. In order to combat this infectious disease, doctors and researchers have tirelessly attempted to gain further knowledge and develop technical devices for the diagnosis of this disease. Attempts include the advancement of drugs and vaccines [3], the formation of epidemiologic methods for forecasting the spread of the disease among people [4], the advancement of mobile device apps to track patients who are infected, the enhancement of policies, and the implementation of novel technologies for managing the capacities and resources in clinics [5]. Owing to the increasing prevalence of coronavirus infection worldwide, many studies analyzed various prospects of this infection. Research includes recognizing the methods of virus detection, the origin of coronavirus along with examining its gene series, early cases in the countries infected, forecasting COVID-19 cases in the epidemiologic pandemic, and patient information [6].

Machine learning (ML) procedures first accumulate data individually, i.e., from various sources [7], followed by fixing the preprocessed data to solve data-oriented problems and minimize space size by erasing incorrect data and selecting relevant data. However, in some cases, the dataset value may differ for the system in making decisions, thus, ML methods were devised with the help of other ideas such as probability statistics and theory control for examining data and deriving useful and original information, concealed information, or patterns from past knowledge [8]. Lastly, the performance assessments of the methods is conducted and model optimization concludes the process, which enhances the method with the help of novel rules and a dataset. ML approaches are utilized in various sectors such as traffic management, medicine, education, manufacturing and production, robotics, engineering, and so on [9,10,11]. Currently, ML methods are utilized in the examination of high dimensional bio-medical structured and unstructured datasets. Malaria Diagnosis, diabetes risk assessment, typhoid, and vascular diseases classification, and genomic and genetic data studies are a few instances of the bio-medical use of ML approaches.

The authors in [12] developed a novel image processing-based approach for the health care system called “C19D-Net”, for the detection of “COVID-19” in CXR images to assist radiologist in enhancing COVID-19 classification performance. It uses the InceptionV4 model with a multi-SVM classifier for COVID-19 diagnosis in distinct class labels. In [13], the researchers present a solution to analyze the COVID-19 pandemic dataset. Specifically, the solution focuses on analyzing useful knowledge and valuable information (for example, frequency, distribution, and pattern) of healthcare statuses and characteristics in populations. The availability of knowledge and information assists users (for example, civilians, and researchers) in better understanding the disease, and playing an active role in combating, fighting, or controlling the disease. Li et al. [14] performed a qualitative and quantitative assessment of Chinese social networking media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the earlier stage of the COVID-19 epidemic.

The researchers in [15] reviewed the available approaches while anticipating the challenges and difficulties in the improvement of a data-driven strategy to combat the COVID-19 outbreak. A 3M-analysis was presented, which included making, monitoring, and exhibiting decisions. The emphasis was placed on the potential of a familiar data-driven scheme to tackle various problems which increased following the outbreak; firstly, by forecasting and monitoring the spread of the epidemic; secondly, by assessing the efficiency of government decisions; and lastly, by making appropriate decisions. Every step of the roadmap may be exhaustive for an analysis of consolidated theoretic outcomes and the potential applications in the COVID-19 context. Supervised ML methods for COVID-19 infection were designed in [16] with learning algorithms including DT, LR, NB, ANN, and SVM using epidemiology labeled data for negative and positive COVID-19 cases. A correlation coefficient (CC) analysis between independent features and dependent features was performed to determine stronger relationships among every dependent and independent feature of the prior dataset to develop the model.

The researchers in [17] processed available information on US states to create an incorporated dataset on potential factors which cause the epidemic to spread. Then, the researchers made use of the supervised ML approach to reach a consensus and rank the crucial factor. The author performed a regression analysis to identify the crucial pre-lockdown factors that affected post-lockdown contamination and death to inform lockdown-related policy making. Kim et al. [18] aimed to examine the relationship between obesity, as determined by the body mass index (BMI), with mortality and morbidity owing to COVID-19. Information from 5628 approved COVID-19 patients were gathered by the Center for Disease Control and Prevention of Korea. The odds ratio (OR) of diabetes and morbidity in the BMI groups was examined using LR models attuned for similar covariates. Hawkins et al. [19] designed a nationwide COVID-19 dataset at the county level that was coupled with the Distressed Communities Index (DCI) and their constituent metrics of socio-economic status.

This study developed an Enhanced Gravitational Search Optimization with Hybrid Deep Learning Model (EGSO-HDLM) for COVID-19 diagnosis using epidemiology data. The proposed EGSO-HDLM model employs a hybrid convolutional neural network with gated recurrent unit based fusion (HCNN-GRUF) model. In addition, the hyperparameter optimization of the HCNN-GRUF model was improved by the use of the EGSO algorithm, which was derived by including the concepts of cat map and the traditional GSO algorithm. To assure the enhanced performance of the EGSO-HDLM model, an extensive set of experimental validation processes on a benchmark dataset was performed. In short, the paper’s contribution is summarized as follows.

An intelligent EGSO-HDLM model which includes pre-processing, HCNN-GRUF classification, and EGSO-based parameter optimization is presented. To the best of our knowledge, the EGSO-HDLM model has not previously been presented in the literature.
Hyperparameter optimization of the HCNN-GRU model with the EGSO algorithm using cross-validation helps to boost the predictive outcome of the EGSO-HDLM model for unseen data.
Validating the performance of the proposed EGSO-HDLM model on a benchmark dataset from the Kaggle repository.

2. The Proposed Model

In this study, a new EGSO-HDLM model was designed for the identification and classification of COVID-19 using epidemiology data. Primarily, the presented EGSO-HDLM model applied data normalization to pre-process the actual data. Then, the HCNN-GRUF model was applied for the classification process. Moreover, the hyperparameter optimization of the HCNN-GRUF model was improved by the use of the EGSO algorithm. Figure 1 depicts the overall process of the EGSO-HDLM algorithm.

2.1. Data Pre-processing

When the numerical processing was executed, a huge variance in magnitude amongst the values became apparent. This condition can cause problems such as minimal convergence of networks and saturation of neuron outcomes [20]; therefore, normalized original data were required. The following Min-Max normalized approach (Equation (1)) can be used for normalizing the data to intervals of zero and one:

x^{*} = \frac{x - x_{\min}}{x_{\max} - x_{\min}} .

(1)

where

x^{*}

signifies the normalized data,

x

represents the original data,

x_{\min}

implies the minimal data values from the present element, and

x_{\max}

denotes the maximal data value from the present elements.

2.2. Design of HCNN-GRUF Based Classification

CNN [21] is a hierarchical FFNN approach that is dependent upon convolutional as well as pooling functions. Convolution was utilized to extract local features for further processing with subsequent layers. Primarily, the resultant input layers are convoluted with several

n \times h

filters (convolutional kernel). Next, the resultant convolutional layer was created by linking the outcomes of the convolutional function that differs with filter size

h .

m_{i} = f (w \cdot x_{i + h - 1} + b)

(2)

M = [m_{1}, m_{2}, m_{3}, \dots, m_{l - h + 1}]

(3)

In Equation (2),

m_{i}

refers the

i t h

feature extracted with convolutional function,

f

denotes the non-linear function,

w

signifies the weighting of filters,

h

denotes the filtering window size, and

b

implies the bias. A combination of GRU and LSTM was applied for a gated recurrent neural network (RNN) which remembers a long order of data and thus decreases data loss. Related to LSTM, GRU decreases the amount of gated units that reduces the processing time but preserves the accuracy. GRU has two gated sub-units, namely reset as well as update gates. In every moment, the GRU receives the existing as well as hidden states in the preceding moment with their update gate, and defines the activation state of their individual neurons. Additionally, the reset gate receives both states and defines if any input data are forgotten. Figure 2 depicts the framework of the GRU technique. Input at the present moment is then integrated with the weighted and resultant reset gate to obtain the memory contents at the present moment with activation functions. Next, the upgrade gate obtains the memory content at the present moment and hidden state in the preceding moment for determining the output and hidden state at the present moment. The function of GRU is summarized in the following formulas.

z_{t} = σ (W^{(z)} \cdot x_{t} + U^{(z)} \cdot h_{t - 1})

(4)

r_{t} = σ (W^{(z)} \cdot x_{t} + U^{(r)} \cdot h_{t - 1})

(5)

h_{t}^{'} = t a n h (W \cdot x_{t} + r_{t} \cdot U \cdot h_{t - 1})

(6)

h_{t} = z_{t} \cdot h_{t - 1} + (1 - z_{t}) \cdot h_{t}^{'}

(7)

In Equation (4),

z_{t}

signifies the upgrade gate,

W^{(z)}

and

U^{(z)}

imply the weighting of

z_{t},

σ

stands for the activation function,

x_{t}

indicates the input at present moment, and

h_{t - 1}

represents the hidden output in the preceding moment.

r_{t}

stands for the reset gate, and

W^{(r)}

and

U^{(r)}

refer the weighting of

r_{t} .

h_{t}^{'}

defines the memory satisfied at the present moment,

\tanh

implies the activation functions, and

W

and

U

represent the weighting at the present moment.

h_{t}

defines the outcome at the present moment. Pooling is the procedure of extracting data for the purpose of reducing the size of the resultant GRU layer. The maximal pooling approach was utilized for extracting the maximal vector value of all the inputs as the result of pooling layers.

\hat{m} = \max \{M\}

(8)

z = [{\hat{m}}_{1}, {\hat{m}}_{2}, {\hat{m}}_{3}, \dots \dots, {\hat{m}}_{k}]

(9)

where

M

denotes the typical vector in the GRU to pooling layers,

\hat{m}

. implies the maximal of

M,

z

signifies the output resultant in the pooling layer, and

k

denotes the amount of features input to pooling layers. The fusion layer was planned to merge more than 2 layers of tensors; during this case, it integrates several tensors in the polling layer as opposed to one tensor. It can be executed by the “Concatenate” approach that receives the final bit as axis, and splices all the resultant polling layers for creating the result of this layer.

b = [a_{1}, a_{2}]

(10)

In Equation (10),

b

refers the output of this layer,

a a

denotes the resultant in the polling layers. The softmax classification is usually employed for several classifier problems. The fully connected (FC) layer was utilized to connect an input in the fusion layer and the value of all the neurons was computed utilizing the softmax function. Additionally, the maximal value of every neuron output was obtained as the classifier outcome.

\hat{p} = s o f t \max (W \cdot x + b)

(11)

y = a r g m a x ({\hat{p}}_{y})

(12)

In Equation (11),

{\hat{p}}_{y}

signifies the probability of

y,

forecast labels.

W

and

b

denote the weights as well as biases. In Equation (12),

y

denotes the last forecast label.

2.3. Hyperparameter Tuning

At this stage, the hyperparameter tuning of the HCNN-GRUF model can be tuned by the design of the EGSO algorithm. Rashedi [22] developed a GSO algorithm, which is an optimization algorithm that gained popularity in the past few decades. It depends on the second law of motion, as illustrated in Equation (15), and the law of gravity, as demonstrated in Equation (13). Additionally, it depends on the physical concepts of passive gravitational mass, inertial mass, and active gravitational mass. The second law of motion states that once a force (F) is applied to all the particles, the (a) acceleration is defined by the mass (M) and force. The law of gravity says that all the particles attract all the other particles with a gravitational force (F). The gravitational force (F) between these particles is directly proportionate to the product of (

M_{1}

and

M_{2}

) masses and inversely proportionate to the square of

(R^{2})

distance.

F = G \frac{M_{1} M_{2}}{R^{2}}

(13)

G

indicates the gravitational constant that reduces with increasing time, and it is evaluated by the following expression:

G (t) = G (t_{0}) \times {(\frac{t_{0}}{t})}^{β}, β < 1

(14)

a = \frac{F}{M}

(15)

The GSA algorithm is briefly described in the following steps:

Step one: Initialization

An isolated scheme with

N

number of particles is assumed, where the location of

i

-

t h

particles is represented as follows:

P_{i} = (p_{i}^{1}, p_{i}^{2}, \dots, p_{i}^{d}, \dots, p_{i}^{N}) f o r i = 1, 2, 3, \dots, N

(16)

In Equation (16),

p_{i}^{d}

indicates the location of

i

-

t h

particles in dimension

d

.

Step two: Fitness assessment of particle

Here, the best and worst fitness values are evaluated using Equations (17) and (18) correspondingly. Minimization and maximization problems are evaluated by Equations (19) and (20).

w o r s t (t) = \max_{j, \{1, \dots, N\}} f i t n e s s_{j} (t)

(17)

b e s t (t) = \min_{j, \{1, \dots, N\}} f i t n e s s_{j} (t)

(18)

w o r s t (t) = \min_{j, \{1, \dots, N\}} f i t n e s s_{j} (t)

(19)

b e s t (t) = \max_{j, \{1, \dots, N\}} f i t n e s s_{j} (t)

(20)

In the equation,

f i t n e s s_{j} (t)

refers to the fitness of

j

-

t h

particles at

t

time.

Step three: Evaluate the $G (t)$ gravitational constant

Here, the

G (t)

gravitational constant at

t

time can be evaluated using Equation (21):

G (t) = G_{0} \times (1 - \frac{t}{t_{\max}})

(21)

where

G_{0}

is the primary value of the gravitational constant initialized at a random fashion,

t

indicates the present time, and

t_{\max}

denotes the overall time.

Step four: Upgrade the inertial and gravitational mass

Here, the gravitational and inertia mass are upgraded with the help of the fitness function. Equality of gravitational and inertia masses is assumed, and the masses value is evaluated using the following expression:

M_{i i} = M_{p i} = M_{a i} = M_{i} f o r i = 1, 2, 3, \dots N

(22)

m_{i} (t) = \frac{f i t n e s s_{i} (t) - w o r s t (t)}{b e s t (t) - w o r s t (t)}

(23)

M_{i} (t) = \frac{m_{i} (t)}{\sum_{j = 1}^{N} m_{j} (t)}

(24)

In the expression,

f i t n e s s_{i} (t)

represent the fitness of

i

-

t h

particles at time and

t,

M_{i} (t)

denotes the mass of

i

-

t h

particles at

t

time.

Step five: Calculate the overall force

Here, the complete force

F_{i}^{d} (t)

that is exerted on the

i

-

t h

particles in a

d

-

t h

dimension at

t

time is evaluated by the following equation:

F_{i}^{d} (t) = \sum_{j = 1, j \neq i}^{kb e s t} r a n d_{j} F_{i j}^{d} (t)

(25)

In Equation (25),

r a n d_{j}

represent a random integer

\in

[0, 1]

, and k

b e s t

is the set of initial particles with the biggest masses and the best fitness value. The force applied from mass ‘j’ on mass ‘i’ at ‘t’ time can be expressed as

F_{i j}^{d} (t)

and is evaluated using Equation (26):

F_{i j}^{d} (t) = G (t) \frac{M_{p i} (t) \times M_{a j} (t)}{R_{i j} (t) + τ} (p_{j}^{d} (t) - p_{i}^{d} (t))

(26)

Consider that

M_{p i}

refers to the passive gravitational mass correlated to

i

-

t h

particles

,

M a_{j}

denotes the active gravitational mass interrelated to

j

-

t h

particles.

τ

indicates a smaller positive constant to avoid division by 0, and

R (t)

indicates the Euclidian distance between

i

-

t h

and

j

-

t h

particles:

R_{i j} (t) = ‖ p_{i} (t), p_{j} (t) ‖_{2}

(27)

Step six: Calculate the acceleration and velocity

Here,

F_{i}^{d} (t)

, the acceleration of

i

-

t h

particles;

a_{i}^{d} (t)

at

t

time in the

d

direction; and the velocity of

i

-

t h

particles in the

d

-

t h

direction,

u_{i}^{d} (t + 1)

, are evaluated by:

a_{i}^{d} (t) = \frac{F_{i}^{d} (t)}{M_{i i} (t)}

(28)

u_{i}^{d} (t + 1) = r a n d_{i} \times u_{i}^{d} (t) + a_{i}^{d} (t)

(29)

From the expression,

M_{i i} (t)

refers to the inertial mass of

i

-

t h

particles, and

r a n d_{i}

indicates an arbitrary integer

\in [0, 1] .

Step seven: Upgrade particle location

Here, the location of

i

-

t h

particles in the

d

-

t h

direction,

p_{i}^{d} (t + 1)

, is evaluated by the following expression:

p_{i}^{d} (t + 1) = p_{i}^{d} (t) + u_{i}^{d} (t + 1)

(30)

Step eight: Repeat steps two to seven until the end condition is met

In the EGSO algorithm, the GSO algorithm is integrated into the cat map concept to reduce the ergodic problem, avoid premature convergence and enhance the algorithm’s efficiency. In optimization algorithms, the grouping of SI optimization algorithms and chaotic mapping improves the outcomes of SI optimization. This concept was confirmed by several instances and depends on Chaos features including sensitivity, randomness, and ergodicity to the initial condition. Hence, an attempt was made to integrate the Chaos concept with GSO to attain good accuracy and efficiency [23]. Until now, the SI optimization approach was generally integrated into tent or logistic mapping models to accomplish a certain amount of progress. However, the chaotic sequence produced from the logistic mapping is of a non-uniform distribution. This follows Chebyshev’s Distribution, where density is lower at the center and is higher at both ends. These features have some level of impact on the ergodicity of optimization solution space. The chaotic sequence made from tent mapping models follows uniform distribution; however, the values rapidly fall into the smaller cycle due to the impact of word length. It therefore lacks better randomness. This method could generate a uniformly distributed chaotic sequence, to enhance GSO single distribution and might improve the model searching efficacy. A chaotic algorithm, Cat Map, is generally utilized in image encryption.

{\begin{matrix} x_{n + 1} = (x_{n} + y_{n}) \\ y_{n + 1} = (x_{n} + 2 y_{n}) \end{matrix}}_{m o d 1}

(31)

Here, the chaotic sequence produced by the higher dimensional Cat Map at the range of [0, 1] is fixed as an arbitrary value rather than the rand function to determine individual distribution. The mapping value lies within the range of [0, 1] when building the multi-dimension chaotic sequence Cat(i,j). The sequential dataset in the dimension is independent of others and has stronger ergodicity. These features decrease the ergodic problem carried by random distribution and assures the ergodicity of individuals flying in the solution space. This technique might improve the algorithm performance and prevent premature convergence.

3. Results and Discussion

This section describes the experimental validation of the EGSO-HDLM technique, which was carried out with the help of the benchmark epidemiology dataset from Kaggle repository [24]. In this article, a set of 5000 samples in positive class and 5000 samples in negative classes were considered. The proposed model was simulated using the Python 3.6.5 tool on PC i5-8600k, GeForce 1050Ti 4 GB, 16 GB RAM, 250 GB SSD, and 1 TB HDD. The parameter settings were as follows: dropout: 0.5, batch size: 5, epoch count: 50, and activation: ReLU.

The confusion matrices generated by the EGSO-HDLM model on the test data are demonstrated in Figure 3. For 70% of the training (TR) data, the EGSO-HDLM model identified 3330 samples in the positive class and 3465 samples in the negative class. Additionally, for 30% of the testing (TS) data, the EGSO-HDLM system identified 1425 samples in the positive class and 1496 samples in the negative class. In addition, for 80% of the TR data, the EGSO-HDLM method identified 3885 samples as positive class and 3896 samples as negative class. Finally, on 20% of TS data, the EGSO-HDLM system identified 977 samples in the positive class and 974 samples in the negative class.

Table 1 offers detailed COVID-19 recognition outcomes of the EGSO-HDLM model for 70% of the TR and 30% of the TS data. A brief result analysis of the EGSO-HDLM model in the identification of COVID-19 for 70% of the TR data is depicted in Figure 4. The results implied that the EGSO-HDLM model classified all the samples into corresponding positive and negative classes. For instance, the EGSO-HDLM model identified positive samples with

a c c u_{y}

of 97.07%,

p r e c_{n}

of 99.23%,

r e c a_{l}

of 94.90%,

s p e c_{y}

of 99.26%, and

F_{s c o r e}

of 97.01%. Additionally, the EGSO-HDLM technique identified negative samples with

a c c u_{y}

of 97.07%,

p r e c_{n}

of 95.09%,

r e c a_{l}

of 99.26%,

s p e c_{y}

of 94.90%, and

F_{s c o r e}

of 97.13%.

A brief overview of the EGSO-HDLM technique in the identification of COVID-19 under 30% the TS data is shown in Figure 5. The results implied that the EGSO-HDLM methodology recognized all the samples as corresponding positive and negative classes. For example, the EGSO-HDLM system identified positive samples with

a c c u_{y}

of 97.37%,

p r e c_{n}

of 99.10%,

r e c a_{l}

of 95.57%,

s p e c_{y}

of 99.14%, and

F_{s c o r e}

of 97.30%. Moreover, the EGSO-HDLM approach identified negative samples with

a c c u_{y}

of 97.37%,

p r e c_{n}

of 95.77%,

r e c a_{l}

of 99.14%,

s p e c_{y}

of 95.57%, and

F_{s c o r e}

of 97.43%.

Table 2 presents detailed COVID-19 identification outcomes of the EGSO-HDLM system for 80% of the TR and 20% of the TS data. A brief analysis of the EGSO-HDLM approach in the identification of COVID-19 for 80% of the TR data is portrayed in Figure 6. The results implied that the EGSO-HDLM model identified all the samples into corresponding positive and negative classes. For example, the EGSO-HDLM methodology identified positive samples with

a c c u_{y}

of 97.26%,

p r e c_{n}

of 97.34%,

r e c a_{l}

of 97.17%,

s p e c_{y}

of 97.35%, and

F_{s c o r e}

of 97.26%. Additionally, the EGSO-HDLM model identified negative samples with

a c c u_{y}

of 97.26%,

p r e c_{n}

of 97.18%,

r e c a_{l}

of 97.35%,

s p e c_{y}

of 97.17%, and

F_{s c o r e}

of 97.27%.

A brief overview of the EGSO-HDLM approach in the identification of COVID-19 for 20% of the TS data is depicted in Figure 7. The results implied that the EGSO-HDLM technique identified all the samples as corresponding positive and negative classes. For example, the EGSO-HDLM method identified positive samples with

a c c u_{y}

of 97.55%,

p r e c_{n}

of 97.60%,

r e c a_{l}

of 97.50%,

s p e c_{y}

of 97.60%, and

F_{s c o r e}

of 97.55%. Additionally, the EGSO-HDLM system identified negative samples with

a c c u_{y}

of 97.55%,

p r e c_{n}

of 97.50%,

r e c a_{l}

of 97.60%,

s p e c_{y}

of 97.50%, and

F_{s c o r e}

of 97.55%.

The training accuracy (TA) and validation accuracy (VA) attained by the EGSO-HDLM method on test dataset are illustrated in Figure 8. The experimental outcome implied that the EGSO-HDLM system gained maximum values of TA and VA. Particularly, the VA seemed to be higher than TA.

The training loss (TL) and validation loss (VL) achieved by the EGSO-HDLM system on test dataset are established in Figure 9. The experimental outcome implied that the EGSO-HDLM methodology had the lowest values of TL and VL. In particular, the VL was lower than TL.

A brief precision-recall analysis of the EGSO-HDLM methodology on test dataset is portrayed in Figure 10. By observing the figure, it can be seen that the EGSO-HDLM technique accomplished maximum precision-recall performance under all classes.

A detailed ROC inquiry of the EGSO-HDLM algorithm on the test dataset is portrayed in Figure 11. The results implied that the EGSO-HDLM technique exhibited its ability in categorizing two distinct classes on the test dataset.

Table 3 portrays a detailed comparison study of the EGSO-HDLM model with recent models under several measures [16]. Figure 12 illustrates an extensive

a c c u_{y}

in the comparison of the EGSO-HDLM model with existing models. The figure indicates that the SGD and ACO models displayed poor performance with minimal

a c c u_{y}

values of 90.01% and 90.67%, respectively. Additionally, the ELM model obtained a slightly enhanced

a c c u_{y}

value of 91.46%. Next, the MLP and LSTM models demonstrated a reasonably closer

a c c u_{y}

of 94.27% and 94.64%, respectively. However, the EGSO-HDLM model offered a maximum

a c c u_{y}

of 97.55%.

Figure 13 demonstrates an extensive

p r e c_{n}

analysis of the EGSO-HDLM technique with existing models. The figure indicates that the SGD and ACO algorithms demonstrated poor performance with minimal

p r e c_{n}

values of 90.08% and 94.40%, respectively. The ELM approach obtained a slightly enhanced

a c c u_{y}

value of 91.18%. Next, the MLP and LSTM models demonstrated a reasonably closer

p r e c_{n}

of 94.58% and 93.18% correspondingly. However, the EGSO-HDLM method provided the maximum

p r e c_{n}

of 97.55%.

Figure 14 illustrates an extensive

r e c a_{l}

inspection of the EGSO-HDLM algorithm in comparison with existing models. The figure indicates that the SGD and ACO systems demonstrated poor performance with minimal

r e c a_{l}

values of 92.37% and 93.36%, respectively. Additionally, the ELM approach gained a slightly enhanced

a c c u_{y}

value of 92.59%. Next, the MLP and LSTM models displayed a reasonably closer

r e c a_{l}

of 93.66% and 94.96%, respectively. However, the EGSO-HDLM system offered the maximum

r e c a_{l}

of 97.55%.

Therefore, the experimental results indicated that the EGSO-HDLM model has the ability to classify COVID-19 in an epidemiology dataset. The enhanced performance of the proposed model is due to the fusion process and EGSO-based hyperparameter tuning process.

4. Conclusions

In this study, a new EGSO-HDLM model was designed for the identification and classification of COVID-19 using epidemiology data. Primarily, the presented EGSO-HDLM model applied data normalization to pre-process the actual data. Then, the HCNN-GRUF model was applied for the classification process. Moreover, the hyperparameter optimization of the HCNN-GRUF model was improved by the use of the EGSO algorithm, which was derived by including the concepts of cat map and the traditional GSO algorithm. The design of the EGSO algorithm helps in reducing the ergodic problem, avoiding premature convergence, and enhancing algorithm efficiency. To assure the better performance of the EGSO-HDLM model, an extensive set of experimental validations on a benchmark dataset was performed. The simulation results ensured the enhanced performance of the EGSO-HDLM model over recent approaches. In future, the classification performance of the EGSO-HDLM model can be improved by the use of feature selection and outlier removal approaches. Moreover, the proposed model can be extended to the detection of other diseases.

Author Contributions

Conceptualization, M.R. and H.C.; methodology, M.W.A.-R. and A.H.A.; software, S.S.B. and M.R.; validation, H.C. and A.H.A.; formal analysis, S.S.B. and M.W.A.-R.; investigation, M.R. and H.C.; resources, M.W.A.-R.; data curation, S.S.B.; writing—original draft preparation, H.C.; writing—review and editing, A.H.A.; visualization, S.S.B.; supervision, M.W.A.-R.; project administration, M.R.; funding acquisition, M.W.A.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (G:813-140-1441).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated during the current study.

Acknowledgments

This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah. The authors, therefore, gratefully acknowledge with thanks DSR for technical and financial support.

Conflicts of Interest

The authors declare no conflict of interests.

References

Shuja, J.; Alanazi, E.; Alasmary, W.; Alashaikh, A. COVID-19 open source data sets: A comprehensive survey. Appl. Intell. 2021, 51, 1296–1325. [Google Scholar] [CrossRef] [PubMed]
Yuki, K.; Fujiogi, M.; Koutsogiannaki, S. COVID-19 pathophysiology: A review. Clin. Immunol. 2020, 215, 108427. [Google Scholar] [CrossRef] [PubMed]
Rockett, R.J.; Arnott, A.; Lam, C.; Sadsad, R.; Timms, V.; Gray, K.A.; Eden, J.S.; Chang, S.; Gall, M.; Draper, J.; et al. Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Nat. Med. 2020, 26, 1398–1404. [Google Scholar] [CrossRef] [PubMed]
Böhmer, M.M.; Buchholz, U.; Corman, V.M.; Hoch, M.; Katz, K.; Marosevic, D.V.; Böhm, S.; Woudenberg, T.; Ackermann, N.; Konrad, R.; et al. Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: A case series. Lancet Infect. Dis. 2020, 20, 920–928. [Google Scholar] [CrossRef]
Pan, A.; Liu, L.; Wang, C.; Guo, H.; Hao, X.; Wang, Q.; Huang, J.; He, N.; Yu, H.; Lin, X.; et al. Association of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China. JAMA 2020, 323, 1915–1923. [Google Scholar] [CrossRef] [Green Version]
Xia, X.Y.; Wu, J.; Liu, H.L.; Xia, H.; Jia, B.; Huang, W.X. Epidemiological and initial clinical characteristics of patients with family aggregation of COVID-19. J. Clin. Virol. 2020, 127, 104360. [Google Scholar] [CrossRef]
Wu, Q.; Xing, Y.; Shi, L.; Li, W.; Gao, Y.; Pan, S.; Wang, Y.; Wang, W.; Xing, Q. Coinfection and other clinical characteristics of COVID-19 in children. Pediatrics 2020, 146, e20200961. [Google Scholar] [CrossRef]
Ibrahim, N.K. Epidemiologic surveillance for controlling Covid-19 pandemic: Types, challenges and implications. J. Infect. Public Health 2020, 13, 1630–1638. [Google Scholar] [CrossRef]
Chauhan, H.; Gupta, D.; Gupta, S.; Singh, A.; Aljahdali, H.M.; Goyal, N.; Noya, I.D.; Kadry, S. Blockchain Enabled Transparent and Anti-Counterfeiting Supply of COVID-19 Vaccine Vials. Vaccines 2021, 9, 1239. [Google Scholar] [CrossRef]
Mansour, R.F.; Escorcia-Gutierrez, J.; Gamarra, M.; Gupta, D.; Castillo, O.; Kumar, S. Unsupervised deep learning based variational autoencoder model for COVID-19 diagnosis and classification. Pattern Recognit. Lett. 2021, 151, 267–274. [Google Scholar] [CrossRef]
Ragab, M.; Alshehri, S.; Alhakamy, N.A.; Mansour, R.F.; Koundal, D. Multiclass Classification of Chest X-Ray Images for the Prediction of COVID-19 Using Capsule Network. Comput. Intell. Neurosci. 2022, 2022, 6185013. [Google Scholar] [CrossRef]
Kaur, P.; Harnal, S.; Tiwari, R.; Alharithi, F.S.; Almulihi, A.H.; Noya, I.D.; Goyal, N. A hybrid convolutional neural network model for diagnosis of COVID-19 using chest X-ray images. Int. J. Environ. Res. Public Health 2021, 18, 12191. [Google Scholar] [CrossRef]
Zhao, C.; Leung, C.K.; Pazdor, A.G.; Wen, Q. October. Analyzing COVID-19 epidemiological data. In Proceedings of the 2021 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Edmonton, AB, Canada, 25–28 October 2021; pp. 985–990. [Google Scholar]
Li, J.; Xu, Q.; Cuomo, R.; Purushothaman, V.; Mackey, T. Data mining and content analysis of the Chinese social media platform Weibo during the early COVID-19 outbreak: Retrospective observational infoveillance study. JMIR Public Health Surveill. 2020, 6, e18700. [Google Scholar] [CrossRef] [Green Version]
Alamo, T.; Reina, D.G.; Millán, P. Data-driven methods to monitor, model, forecast and control COVID-19 pandemic: Leveraging data science, epidemiology and control theory. arXiv 2020, arXiv:2006.01731. [Google Scholar]
Muhammad, L.J.; Algehyne, E.A.; Usman, S.S.; Ahmad, A.; Chakraborty, C.; Mohammed, I.A. Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Comput. Sci. 2021, 2, 11. [Google Scholar] [CrossRef]
Roy, S.; Ghosh, P. Factors affecting COVID-19 infected and death rates inform lockdown-related policymaking. PLoS ONE 2020, 15, e0241165. [Google Scholar] [CrossRef]
Kim, S.Y.; Yoo, D.M.; Min, C.; Wee, J.H.; Kim, J.H.; Choi, H.G. Analysis of mortality and morbidity in COVID-19 patients with obesity using clinical epidemiological data from the Korean Center for Disease Control & Prevention. Int. J. Environ. Res. Public Health 2020, 17, 9336. [Google Scholar]
Hawkins, R.B.; Charles, E.J.; Mehaffey, J.H. Socio-economic status and COVID-19–related cases and fatalities. Public Health 2020, 189, 129–134. [Google Scholar] [CrossRef]
Man, D.; Zeng, F.; Yang, W.; Yu, M.; Lv, J.; Wang, Y. Intelligent intrusion detection based on federated learning for edge-assisted internet of things. Secur. Commun. Netw. 2021, 2021, 9361348. [Google Scholar] [CrossRef]
Ding, L.; Fang, W.; Luo, H.; Love, P.E.; Zhong, B.; Ouyang, X. A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memory. Autom. Constr. 2018, 86, 118–124. [Google Scholar] [CrossRef]
Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S. GSA: A gravitational search algorithm. Inf. Sci. 2009, 179, 2232–2248. [Google Scholar] [CrossRef]
Gao, S.; Vairappan, C.; Wang, Y.; Cao, Q.; Tang, Z. Gravitational search algorithm combined with chaos for unconstrained numerical optimization. Appl. Math. Comput. 2014, 231, 48–62. [Google Scholar] [CrossRef]
Available online: https://www.kaggle.com/marianarfranklin/mexico-covid19-clinical-data/metadata (accessed on 24 May 2022).

Figure 1. Overall process of EGSO-HDLM technique.

Figure 2. Structure of GRU Model.

Figure 3. Confusion matrices of EGSO-HDLM algorithm (a) 70% of TR; (b) 30% of TS; (c) 80% of TR; (d) 20% of TS data.

Figure 4. Result analysis of EGSO-HDLM approach under 70% of TR data.

Figure 5. Result analysis of EGSO-HDLM approach under 30% of TS data.

Figure 6. Result analysis of EGSO-HDLM approach under 80% of TR data.

Figure 7. Result analysis of EGSO-HDLM approach under 20% of TS data.

Figure 8. TA and VA analysis of EGSO-HDLM methodology.

Figure 9. TL and VL analysis of EGSO-HDLM methodology.

Figure 10. Precision-recall curve analysis of EGSO-HDLM methodology.

Figure 11. ROC curve analysis of EGSO-HDLM methodology.