2. Background and Literature Review
Previous studies [
5,
6,
7] applied a modified SEIR compartmental model to predict the COVID-19 epidemic dynamics. This study incorporates a basic SEIR model [
8] to describe the dynamic process of epidemic propagation. Given a population that contains a group of individuals, a compartment SEIR model includes four states at time
t: susceptible (
, exposed
, infected
, and recovered
, as shown in
Figure 1. Individuals in the susceptible state do not have the disease at time
t but may become infected if they come in contact with an infectious person. The infected state includes individuals with a positive COVID-19 test result at time
t who may infect a susceptible individual if they come in contact with each other. Individuals in the recovered state have recovered from COVID-19 and are no longer contagious at time
t. Recovered individuals return to the susceptible state.
The rates of change of the above-mentioned states are represented by a set of coupled ordinary differential equations (ODEs), as shown in Equations (1)–(4).
Table 1 summarizes the characteristics and applications of COVID-19-related existing research studies.
As shown in
Table 1, most of the COVID 19-related research studies ignored human mobility and its considerations owing to limitations of the frameworks used. Ibrahim and Murat [
24] and Iacus et al. [
25] showed the relationship between the initial spread of COVID-19and human mobility. Iacus et al. [
26] showed that human mobility can explain up to 92% of the initial spreads in France and Italy. Ying et al. [
26] considered the potential effects of various intra-city mobility restrictions to prevent the transmission of the disease. The spread relationship with human mobility made a number of countries implement human mobility-based policies [
27,
28] such as social distancing, stay-at-home policy, and several travel bans. Bonaccorsi et al. [
29] and Pepe et al. [
30] indicated a significant relationship between human mobility and restrictions on public transport. However, detailed prediction models considering human mobility have provided comparatively less owing to data issues and modeling complexities.
To overcome this issue, this research incorporates a deep learning approach into a COVID-19 SEIR model considering human mobility. There are several relevant research studies using deep learning methods.
Table 2 summarizes the characteristics of the relevant research studies using a deep learning approach.
A number of research studies have applied deep learning approaches to detect epidemic patterns or predict future infectious features; the characteristics of these studies are summarized in
Table 2. However, the integration of an ODE-based compartment framework in a deep learning framework has been studied comparatively so far. This research study combines both approaches and shows their effectiveness in predicting the epidemic trend of COVID-19.
3. Methodology of Hybrid Deep Learning Framework based on Meta-population
This paper proposes a new and effective framework, integrating an ODE-based meta-population model and deep neural network modeling [
39,
40] to explain two hypotheses. The first hypothesis is that COVID-19 spread can be analyzed using an SEIR model. It is explained in the previous section. The second hypothesis is that a hybrid deep model can predict the COVID-19 spread considering human mobility. This section and the following section explain the second hypothesis and test it with relevant case studies.
The prediction of disease propagation is achieved using an extended SEIR model. Meta-population [
41,
42,
43] refers to a group of separated sub-populations of the same species, which are connected by an interaction network. Large-scale epidemic outbreaks, such as global pandemics, can be modeled using the propagation of pathogens through a meta-population network. Intra-cities in a country are modeled as sub-populations, and their human mobility is modeled with the network connecting the sub-populations, as shown in
Figure 2. Individual movements among intra-cities can be considered as disease carriers. This research focuses on two cities in South Korea: Daegu and Seoul. Daegu is the city where the outbreak started in South Korea, and Seoul is the capital city with the highest trend of ongoing transmission. A susceptible individual of sub-population
may contact infectious individuals from two sources: infectious individuals from the same sub-population,
, or infectious visitors from sub-population
.
The system in the meta-population model is divided into several connected sub-populations, along which individuals travel via transportation. Inside each sub-population, homogeneous mixing is assumed. This indicates that the mathematical modeling can capture the travel flows among sub-populations. The epidemic process of an infectious disease can be decomposed into different timelines of each sub-population, which can be represented as spatial and temporal processes.
Figure 3 shows the SEIR-based meta-population model considering human mobility.
Equations (5)–(12) represent sets of ODEs of disease transmission between two sub-populations in the meta-population model.
The parameters and terms of the model are provided in
Table 3.
As human-to-human transmission has been confirmed in the spreading of COVID-19, gatherings of people, contacts, or travels among cities (e.g., intra-city contact and intercity travel) of infected and exposed individuals have been the main reason for the spread of the virus. Based on the collected data, this study constructs the migration matrix at time
t, which is given by Equation (13).
where
K is the number of cities or regions (herein,
K = 17), and
is the rate of people traveling from city
to city
at time
. The migration matrix
is non-symmetric (
) because the traffic between cities is different at any given time. The number of outflow migrants of city
at time
is denoted by Equation (14).
The number of inflow migrants of city
at time
is denoted by Equation (15).
In the SEIR model, , , , and are defined as , , , and , respectively, where is the daily count. As the incubation period for COVID-19 can reach 14 days, the number of exposed individuals (who show no symptoms but are infectious) is crucial for the spread of the disease. State , which is not available from official data, is an important state in this model. In a city , the four states are defined by , , , and at time . The total susceptible population, , is the number of infected individuals in city . If city i has a total population of and the percentage of infected population is , then .
The provided meta-population model incorporates human migration patterns in order to capture the epidemic dynamics. For city
, the daily increase in the number of infected cases includes the inflow of infected individuals from other cities. In real situations, the inflow and outflow of exposed individuals to and from a city are important and must be estimated by the epidemic model. Thus, if
is the rate of people moving from city
to city
on day
, then the number of infected individuals moving from city
to city
is given by Equation (16).
As the total number of migrant rates leaving the city
is
, the number of infected cases that have migrated out of city
is determined from Equation (17), where
is the population of city
on day
.
Thus, the total number of infected cases on day
in city
is given by Equation (18), where
, and
is the infection rate in city
on day
.
Moreover, this paper introduces
(
) to the degree of migration unlikeness. In this manner, Equation (18) is converted into Equation (19).
Incorporating the migration dynamics, the increase in exposed individuals on day
in city
is determined by Equation (20), where
is the infection rate of exposed individuals in city
, and
is the infection rate of susceptible individuals in city
.
Then,
and
are computed from Equations (21) and (22), respectively. In this study, the recovered individuals are assumed to stay in city
, and the recovery rates in different cities are assumed to differ owing to the varied treatments and availability of medical facilities.
The population of city
on day
increases or decreases owing to intercity migration, as shown in Equation (23), where
.
Then, the instant susceptible population in city
j at time
t is defined in Equation (24), where
.
As shown in Equation (25), the parameters in the provided meta-population model are estimated using the historical COVID-19 tracking data.
An estimated number of infected cases in each city is generated using the provided meta-population. However, it is difficult to capture accurate parameters using historical data. The wrong estimation may result in a biased erratic result. To overcome this issue, this study incorporates a meta-population model with a deep learning approach.
Figure 4 shows the architecture of the hybrid approach using deep learning algorithms.
As shown in
Figure 4, the meta-population model with the estimated parameter generates two types of parameters: the time-dependent reproduction number,
, and the number of susceptible–exposed–infected–recovered people. However, it is difficult to obtain accurate results compared to the real COVID-19 data at time
t. For this reason, the gap is minimized using a deep learning model; a general DNN or LSTM model is considered and tested.
Two output data from the previous meta-population model and the infected number of people from the real COVID-19 data (at time t-1) are used as input sets in the deep learning model. Then, the output of the DNN model is the parameters of the meta-population model. The weight in the hybrid deep learning model is updated with the minimization of error. The error is the gap between the real infected people at time t and the estimated infected people from the meta-population model at time t. The backpropagation method was used to minimize the error.
The novelty of this hybrid deep learning method enhances accuracy in the parameter estimation of the SEIR-based meta-population model. The following section shows the effectiveness of the proposed hybrid deep learning model through numerical analysis, and the results will be compared with the existing LSTM and DNN methods.
4. Numerical Analysis and Performance Evaluation Using the Meta-Population-Based Hybrid Deep Learning Framework
To demonstrate the effectiveness of the proposed framework, this study deploys data of COVID-19 cases in South Korea. As the transmission of the disease is still ongoing, the parameter affecting the simulation result has been changed dynamically with time. This study focuses on observing how the established meta-population model leads to a higher accuracy for outbreak prediction using parametric analysis. The parameters were learned using the DNN model. The approach allows us to determine the real-time dynamics of the reproduction number and analyze its relationship with other factors directly affecting the epidemic spread, such as the number of daily infected cases and people’s mobility levels. The two datasets [
4] used for numerical testing are detailed in
Table 4. The datasets are provided by Korea Centers for Disease Control and Prevention (KCDC). The datasets include incidence types and relevant human mobility information of COVID-19. This study analyzes its human mobility with the viewpoint of inner/intra-city mobility.
The basic reproduction number,
, in both cities follows the patterns shown in
Figure 5.
Then, the proposed framework is compared with a general LSTM model and a general deep neural network model. The LSTM and DNN models are trained using the same dataset. Moreover, this study uses two different modules in the deep learning core of the proposed method.
Table 5 shows the characteristics of the compared methods.
As shown in
Table 5, the architecture and hyper-parameters of the methods are determined using the COVID-19 dataset in South Korea.
Figure 6a shows the training errors (general loss and Root Mean Square Error (RMSE)) of the proposed hybrid deep learning framework incorporating the meta-population model and LSTM model. The loss is calculated as the gap between the actual reproduction rate of COVID-19 and the estimated reproduction rate using the proposed method. The architecture and relevant hyper-parameters such as the number of layers, the used optimizer, and learning are determined with experimental analyses.
Figure 6b shows performance comparison with methods with different numbers of layers. Using the experimental tests, the number of layers and other hyper-parameters are determined.
Table 6 illustrated in
Figure 7 show several
values estimated using the different frameworks.
The value of the basic reproduction number obtained using the four methods are also compared with the actual number of infected people, as shown in
Figure 8.
A comparison between the frameworks presented in
Table 7 is illustrated in
Figure 9.
Figure 9 proves experimentally that the proposed method, incorporating the meta-population model and LSTM, has the best performance compared to the other three methods.
The number of infected people obtained using the four methods are also compared with the actual number of infected people, as shown in
Figure 10.
The numerical tests indicate that the proposed method has better performance than other methods. In particular, the hybrid deep learning framework using the meta-population model and LSTM is more accurate than that using the meta-population model and DNN, as indicated by obtaining the least error value. Similar tests were performed for the cases in Daegu; the results are shown in
Table 8, which is illustrated in
Figure 11.
The actual and predicted value of the reproduction number for Daegu using the proposed method are illustrated in
Figure 12.
The actual and predicted numbers of infected people using the four methods shown in
Table 9 are illustrated in
Figure 13.
The actual and predicted numbers of infected people in Daegu using the four methods are illustrated in
Figure 14. The numerical tests indicate that the proposed method has better performance than other methods. In particular, the hybrid deep learning framework using the meta-population model and LSTM is more accurate than that using the meta-population model and DNN considering the lowest error achieved.
The overall performances are provided in
Table 10. The comparison results are calculated as the averages of 50 repetitions. The averages are provided in
Table 10, and their standard deviations are less than 2.28 in
and 0.20 in
, respectively. The effectiveness of hybrid deep learning for predicting the COVID-19 epidemic pattern is proven experimentally by achieving the least RMSE error compare with the other three methods.
5. Conclusions
The epidemic prediction of infectious diseases is one of the most widely studied research areas. A number of research studies using time-series analysis, regression studies, and other statistical methods have been conducted to explain epidemic dynamics. In addition, deep learning approaches have been introduced for forecasting. These methods have been applied for tracking COVID-19 patterns. The emergence of the COVID-19 virus has changed every aspect of our lives and has devastated all societies. Before the effective vaccines come into wide use, it is important to accurately predict the epidemic patterns of COVID-19. However, it is difficult to estimate the pattern owing to the characteristics of COVID-19. As infection is primarily mediated by the droplets of infected people, human mobility has to be considered and incorporated in epidemic analysis.
As COVID-19 follows an SEIR-based infection model, the proposed framework implements an SEIR compartment model. In order to incorporate human mobility among regions, the SEIR compartment model is expanded with relevant human mobility-based modeling parameters and terms. Although the expanded epidemic model has advantages over simple compartment models, its performance depends on the accurate estimation of the epidemic model parameters. To increase the accuracy of parameter estimation, deep learning approaches are deployed. The proposed hybrid deep learning framework consists of an epidemic meta-model and a deep learning model. The estimated epidemic status and real COVID-19 historical data are input to the deep learning model, and the parameters of the meta-population model are generated as output. The weight of the deep learning model is updated with the minimization of the gap between the real COVID-19 status and the estimated status computed from the meta-population model. To demonstrate the effectiveness of the proposed hybrid deep learning model, the estimation results are compared with other forecasting methods. The numerical analysis demonstrates that the hybrid deep learning framework using the meta-population model and LSTM model exhibits the best performance among the testing methods. Therefore, the proposed framework is expected to contribute to the effective control of COVID-19 infection.
In future studies, the epidemic patterns of various infectious diseases can be predicted using the proposed hybrid deep learning framework. As the spread of infectious disease follows a specific epidemic pattern, the meta-population model of the proposed hybrid deep learning framework can be modified and applied depending on the disease scenario. In addition, the deep learning model used can be replaced with more effective models. This research focuses on the prediction of COVID-19 spread considering human mobility. While a number of research studies propose effective restrictions on human mobility, the proposed framework can help the prevention and control of the COVID-19 spread.