1. Introduction
The mathematical modeling of epidemics has a long history (see, for example, [
1]). However, the spread of COVID-19 has given this area of research a significant expansion and advancement. Due to the avalanche of publications on this topic, we cannot discuss all areas in detail here. We only note that along with the classical SIR-type models based on ordinary differential equations and improved recently (see, for example, [
2,
3]), models have appeared that include partial differential equations (see, for example, [
4]), models with stochastic differential equations (e.g., [
5]), agent-based models [
6], etc. In turn, each of these areas has received internal development and generalization. For example, in SIR models, a direction associated with the use of fractional derivatives has emerged (see, e.g., [
7], etc.). In our work, we study some extension of the classical SIR-type model and do not use other approaches.
SIR models have the form of specific systems of ordinary differential equations and contain coefficients that have an important epidemiological meaning. However, some of the coefficients, and sometimes all, are unknown. Therefore, the question of finding them is very relevant, using, for example, data on the dynamics of the number of infected people and, possibly, other data. Thus, an inverse coefficient problem arises for the corresponding SIR system. Methods for solving such inverse problems are well developed. An overview of such methods can be found, for example, in [
8]. Specific implementations of the methods, based mainly on various optimization algorithms, are detailed in numerous works (see, for example, [
9,
10,
11,
12,
13,
14,
15] and others). A detailed analysis of these approaches and a block diagram of their connection are given, for example, in [
16].
In the classical formulation of inverse problems for SIR models, the coefficients are assumed to be constant. However, it turned out that such models are not always adequate to epidemiological data, and, as a rule, do not explain the emergence of pandemic waves. In this regard, we note a recent publication [
17], which proposes a new multi-wave SIR model that can explain the generation of pandemic waves. The principal feature of this model is a new form of differential equations and the use of functions with a retarded argument, while the coefficients of the model are constant.
In our work, we use two other approaches within the framework of SIR models. Our goal is rather modest: an adequate description of the data of the inverse problem, namely, the dynamics of the number of infected for some countries. The first approach uses time-varying coefficients in our SIR model, and here we develop the results of work [
18]. To adequately reproduce the data, we solve the inverse coefficient problem for the model on the class of piecewise constant coefficients with some additional restrictions on the latter. This makes it possible to model inverse problem data with sufficiently high accuracy for a number of countries, but does not describe the generation of epidemic waves. This feature can be explained by the fact that standard SIR models, as a rule, are written for closed systems that do not take into account the external flow of infections. Therefore, to describe the emergence of epidemic waves, we supplement the modeling with variable coefficients by assuming that there are unknown external sources of infection that change over time. These can be in the aggregate latent carriers of the infection, which are activated at different times of the year due to weather conditions, as well as carriers of the infection arriving from other countries and other sources. Mathematically, these sources are modeled by including an additional term, represented by an unknown function of time, in one of the equations of our SIR-type model. To find this term and the coefficients, the problem of minimizing the discrepancy between the data and their analogs calculated from the model is solved under restrictions on piecewise constant coefficients and on the source. Based on the coefficients and sources found, it is possible to fairly accurately reproduce data on infections from several waves of the epidemic for a number of countries. We solved this problem using data from The Johns Hopkins Coronavirus Resource Center (CRC) [
19] for Austria, the Czech Republic, Germany, France, Italy and Russia. The time of the receipt of data is 10 February 2022. Comparison of COVID-19 cases from different sources has been studied [
20]. Due to the size of the article, we are unable to give all these results in detail, so we include figures in the article that correspond only to the calculations for Austria and Russia. For the rest of the countries, we confine ourselves to presenting some numerical data.
2. Modeling of Individual Waves of Epidemic
The SEIR model is widely used in mathematical epidemiology. In this section, we use its modification with variable coefficients from [
18]. The model has the following form:
It will be used for different countries with different values for number
N of people included in epidemic process. The main variables of the model are defined as follows:
S(
t) is the proportion of people who can be infected at time
t;
E(
t) is proportion of infected people in whom illness is not identified yet at time
t, but they are able to infect surrounding people; and
I(
t) is the proportion of ill people with confirmed diagnoses at time
t. One can calculate the number of people from the entered relative values by multiplying by
N. The choice of total number
N of people for considered countries is discussed [
9]. Moreover, the proportion
R(
t) of recovered people at time
t can be determined from system (1) by using the equation:
. However, we do not use this equation, because our goal is an adequate modeling of the quantity
I(
t). So, we consider in this section the SEI model (1).
Note that we assume that coefficients of Equation (1) can be variable unlike the standard SEIR model. The time t was measured in days. The coefficients of system (1) have the following meaning: β(t) is proportional to probability of infection and is measured in 1/days; γ(t) is the reciprocal of the mean time to diagnosis of infection, 1/days; δ(t) is the inverse value of the average time to cure the patient from the moment of diagnosis, 1/days.
The coefficients can vary due to health measures in specific country. For example, a decrease in β(t) can consider the response to introduced restricted measures. A weakening of these measures can result in an increase in β(t). An Increase in γ(t) can be interpreted as a decrease in the average time of the determination of illness, while an increase in δ(t) reflects a decrease in the average time of recovering.
Previously, in [
18], the inverse problem of finding
from the data
was posed. Similar to [
19], the input data for the inverse problem were taken from the statistics of
given in the database of The Johns Hopkins CRC [
19] and were computed as
for some countries. To model piecewise constant coefficients, we divide the time axis into segments of the form
with constant length
. Then, we solve the inverse problem with constant values
separately at each segment. The corresponding solution is found out by the minimization of the following discrepancy functional at
:
under a priori constraints of the form
with estimates
known from the literature. Here,
is the solution of (1) for given
Collecting the results of such minimization for all segments, we obtain a solution in the form of piecewise constant functions
and of a set
. However, such a problem can be ill-posed for each segment, and this is expressed in an ambiguous solution to the discrepancy minimization problem. To isolate a single solution, we used a special variant of Tikhonov regularization [
21]. We present it in the form of computation method 1.
Method 1.
Step 1. Set the values of the regularization parameter ().
Step 2. For each parameter
and for each segment
,
, we minimize on the set
the Tikhonov functional of the form
where the functional
determines the relative deviation of the parameters from their values
obtained at the previous minimization on the segment
. For
n = 0, these quantities are equal to zero. So, at step 2 we obtain the coefficients
, and solving problem (1) for these coefficients, we then find the functions
Step 3. Choice of the regularization parameter . To calculate it, we first find the averages across all segments for each . Next, we build a Pareto curve, , which in the theory of regularization is called an L-curve, and next find on it the point closest to the origin. The value of the parameter corresponding to this point is taken as the optimal value of the regularization parameter .
Step 4. We repeat step 2 for
and for each segment
and find out the optimal coefficients
. Combining these coefficients for all intervals, we obtain the regularized piecewise coefficients
Step 5. At the end, we solve system (1) for each interval with the found coefficients , and so we find the function , which is an approximation for . At the same time, we find the functions .
The procedure described in Method 1 has the following meaning. We try to simultaneously minimize two functionals for each time interval, namely the discrepancy and . This means the best approximation of the problem data using the model while ensuring the smallest change in the model parameters when moving from the previous time interval to the next. To do this, we use a combination of and in the form of the Tikhonov functional. The weight parameter shows what is more important for us, to accurately approximate the data or to provide small changes in the model parameters. At step 3, the weight selection procedure gives a compromise value for all time intervals at once. So, we provide the smallest deviation of the calculated value from the data for all points in time with the smallest change in the model parameters.
A result of such a procedure for solving the inverse problem is shown in
Figure 1 for Russia. All calculations were carried out in MATLAB. The top subplot shows the initial data
and calculated approximation
. The values are converted to the number of people. These curves almost do not differ graphically. The middle subplot shows calculated piecewise coefficients
. The bottom subplot shows the residual values (discrepancy),
, when minimizing at each time interval. An important characteristic of Method 1 is the average value for
over all time intervals,
, which characterizes the quality of the approximation of data
by the found approximate function
. For Russia, we obtain
. For other countries, these values are presented in
Table 1.
Thus, the presented procedure for solving the inverse problem makes it possible to approximate the data for model (1) with good accuracy (2–8%). It is interesting to note that the coefficients calculated in solving the inverse problem generally change slowly, and this is what we wanted by applying Method 1.
Unfortunately, model (1) does not describe well all waves of the COVID-19 epidemic. Such sharply growing waves as for the Omicron strain are approximated by this model with a significant error. For such waves, the model needs to be corrected by including a mechanism for generating successive waves.
3. A mechanism of Successive Waves Simulation in SEI Model
Some models are known from the literature that describe the mechanism of the generation of epidemic waves (see, e.g., [
17]). In this article, we consider one of the possibilities for changing the SEI model (1) so that such generation occurs in it as well. Formally, we ensure this by including an additional term
f(
t) in the third equation of system (1). After multiplying by
N, the function
f(
t) represents the additional number of infected people per day. Therefore, the new modification of the SEI model has the form
with the same initial conditions
. This additional term can, for example, be considered as a source of infections associated with hidden carriers that are seasonally activated due to weather changes. Moreover, this term may include the transition of infection into the country in question from abroad. For model (3), we first set the inverse problem of finding the values
and the function
using the same data
as in
Section 2. We again assume piecewise constancy of the quantities
on time intervals
and try to minimize a new discrepancy of the form
for such interval under restrictions
and
. Here,
is the solution to system (3) for given quantities
.
Again, taking into account the ambiguity of the solution of such an inverse problem, we apply for its solution
Method 2, a modification of Method 1. It is based on introduction of the external source of infection
. In this modification, we replace the quantities
with their analogs
and use the discrepancy of the form
instead of
. Moreover, the functional (2) is replaced by the following:
As a result of applying Method 2, we obtain another approximate counterpart of the data and corresponding piecewise constant optimal coefficients of model (3), .
Now we present the results of numerical experiments on modeling epidemic waves using model (3) and Method 2 for some countries.
Figure 2 refers to Russia. The top subplot represents the curves
and
. One can see their practical coincidence. This subplot also shows a curve representing the dynamics of the changes in external sources of infection,
. Since curves
and
are very different in scale, graph
is given instead of
. All values are converted to the number of people.
The lower subplot demonstrates the found dynamics of the coefficients. Next,
Figure 3 refers to Austria. Again, we present in the upper subplot the quantity
instead of
along with
and
.
Similar calculations for other countries show that data approximations, that is the average discrepancies
, for model (3) are generally better than those for model (2), and this is due to using the source
. This is presented in the
Table 2.
At the same time, the coefficients vary insignificantly, while the coefficient changes more markedly. In all numerical experiments, it turned out that the found waves of external sources of infection, , were ahead of the wave of infection by 10 or more days.
5. Discussion and Conclusions
Numerical experiments with the identification of model (1), which includes piecewise constant coefficients, show that for a number of countries the data to the inverse problem, i.e., infection curves, can be approximated with a high accuracy of 2–8%. This is true for modeling both an isolated wave of an epidemic and a sequence of waves that do not rise sharply. In this case, the found piecewise constant coefficients generally change slightly. For example, the coefficient
(see
Figure 1) approximately changes by 3%. Identification is carried out here using the proposed Method 1 based on Tikhonov regularization. However, for infection waves with a sharp increase, the approximation errors grow tenfold.
To overcome this trouble, we introduce an additional term
into model (1). The term takes into account external and latent sources of infection. For model (3), an identification process similar to Method 1 can be proposed. Then, for the data at our disposal, the model makes it possible to approximate all successive infection peaks, up to the Omicron peak, with an accuracy of about 2–4% for different countries. This is somewhat better than for model (1). This is significantly better (by a factor of 2 to 5) than the results that can be obtained using the algorithm from [
18] when trying to approximate several epidemic waves without the term
using spline variable coefficients.
When modeling the Omicron peak, we encountered the lack of a series of data in the Hopkins Coronavirus Resource Center database. Therefore, we had to use other data that can be extracted from this database, namely the number of infected people per day. Applying another modification of our Method 1 for the identification of model (3) with such input data for some countries, we approximated the infection curves for all waves, including the Omicron wave, with an accuracy of about 10–30%.
The three parts of our work noted here also show the degree of applicability of the SEI models used to describe the COVID epidemic. These models can be applied fairly reliably to approximate many waves of infection curves with high precision and can be used with a certain degree of confidence to identify external and hidden sources of infection. It distinguishes our models from other models with variable coefficients known in the literature.