1. Introduction
Malware is one of the most ubiquitous cyberthreats of modernity. The Internet, as the natural environment of malware, creates an incredibly complex and ever-diversifying scenario for exploitation. As a consequence, this environment resembles more and more life ecosystems, with malware behaving (and evolving) similarly to diseases, while defense systems try to catch up with more robust measures like the immune system of our bodies. The advent of the Internet of Things deepened the similarities even further. Billions of Internet-connected devices with low specifications and limited security capability are being deployed every year [
1,
2], creating even more niches to be exploited. Malware specific to IoT readily emerged. As can be seen, studies on cybersecurity are more urgent than ever to lay the foundation of sustainable development of the Internet in the age of the Internet of Things.
Weak security measures implemented in IoT devices and networks are a chronic problem that is being continuously exploited by self-replicating malware. In recent years, IoT botnets performed DDoS attacks that affected millions of users [
3]. Mirai, the most successful of them, infected half a million devices and took down services like Netflix and Twitter for hours [
2,
4]. To hide the origin of Mirai, its creators uploaded the code to Internet forums, leading to a whole strain of Mirai “offsprings”, each “evolving” with particularities to avoid the countermeasures to Mirai. Evolving threats effectively led to the creation of a field of malware taxonomy, with reports on diverse strains of traditional malware [
5], malware for mobile devices [
6], and malware specific for IoT [
7].
With so many similarities, it is no surprise the scientific study of malware approached the field of biological epidemics. Since the seminal work of Kermack and McKendrick [
8], epidemic models are developed as stages of disease progression, dividing the affected biological populations into demographics experiencing specific stages: susceptible to infection, incubating, infectious, in recovery, quarantined, immunized or deceased. All these biological concepts were then adapted or directly applied to “populations” of malware hosts, the network devices (see
Table 1 for a summary of the notation used throughout this paper). While this is the most used baseline approach to modeling malware, over the years, many mathematical and simulation studies explored different modeling methodologies, but most of them still incorporate compartmental models.
Currently, the majority of studies develop models in the format of differential equations/Markov chains [
9,
10,
11,
12,
13], agent-based models [
14,
15], cellular automata [
16], or pure stochastic models based on simulations [
17,
18], but there are also hybrid models appearing [
19,
20]. A more detailed explanation and discussion of the state of the art will be presented in
Section 2.
Markov chains based on compartmental models are the most popular approach in the literature. While reviewing studies on models of random propagation, it was noticed that all studies assumed a simple model of random propagation and focused their contribution on modeling different compartmental states and parameters to make the model more realistic. The most popular model used to describe random propagation (which will be called “standard nonlinear model” in this paper) is shared by more than 95% of the reviewed literature, consisting of a very simple expression of three terms:
, where
is the rate of infection,
S is the number of susceptible devices, and
I is the number of infectious devices. While this model is accurate enough to approximate the dynamics of random propagation, it ignores one of two dynamics of inefficient target selection during random malware propagation, resulting in an overestimation of the speed of propagation. These dynamics were never explained in detail or explicitly in the literature before our work, nor an exact model of random malware propagation was ever developed to take them into account. In our previous work [
21], we presented the first exact model of random propagation in discrete form, provided an in-depth analysis of these dynamics of inefficiency, and derived the exact Markov chain for the simplest cyclic epidemic model SIS. This paper discusses in detail the modeling assumptions of random propagation and extends the derivation methodology to systems with more states. Step-by-step examples of our methodology are provided for three epidemic models of increasing complexity, as well as the principles to modify the propagation model and incorporate it into different malware models without repeating the burdensome deriving process, demonstrating the generality of methodology and proposed propagation model. The advantage of our model is the exact representation of random propagation, with a trade-off of only two additional mathematical operations compared to the standard nonlinear model, whose errors can reach 9.88% against simulations (the errors of our model are all below 0.2%). The disadvantage of our model is the discrete formulation and the dependency on Markov chain assumptions in the subdynamics of random propagation, i.e., to calculate the propagation of malware, the dynamics need to depend only on the previous state of the system. However, we note that this dependency does not apply to the complete malware model itself, as we will show by deriving the exact dynamics of propagation for the malware model SEIRS, which has a delay on two-state transitions and needs to keep track of past states beyond the last.
In summary, the contributions of this paper are:
The generalization of the methodology of derivation of the dynamics of random propagation for any compartmental model of malware.
The proposal of an alternative rule-based methodology of derivation, carried out by modification of the simplest form of the propagation dynamics.
An analysis of the impact of the propagation model in complex malware models, including incubation, temporary immunity, and network heterogeneity.
The rest of this paper is organized as follows:
Section 2 presents a review of the literature and related works.
Section 3 makes an in-depth discussion of random propagation and its two mechanisms of inefficiency. It also presents our proposed canonical states of random propagation and defines the proper way to model random propagation as Markov chains.
Section 4 showcases our methodology of the derivation of exact Markov chains of random propagation using three epidemic models, SIS, SEIRS, and 2SEIS.
Section 5 presents the derived complete dynamics of the three epidemic models using our proposed propagation model.
Section 6 presents the validation of our proposed model and compares its performance with the most popular model in the literature.
Section 7 summarizes the paper and presents our conclusions.
2. Review of Literature Models
Malware models describe the rate of change of states in a network, in particular the states of devices with respect to behavior before and after infection. All devices presenting the same state are grouped in a population of the same name, whose size is tracked by the model. In this work, we will limit our analysis to discrete models, which are formulated as difference equations:
where
is the vector of states representing the size of different populations,
T is the current instance of discrete time, and
is the resulting dynamics of malware that alter the size of the populations.
The use of compartmental models of epidemics to represent the change of states in populations started with Kermack and Kendrick, who proposed an SIR model that described an epidemic as a noncyclic process of susceptible individuals being infected and then acquiring a recovered-immunized state. This simple yet insightful model introduced the two basic compartmental dynamics: the infection dynamic
and the recovery dynamic
, shown in Equations (2a)–(2c). Since then, additional dynamics were introduced in the form of additional states, transitions between states, and parameters.
The most common states modeled for network populations are susceptible (S), exposed (E), infected (I), recovered (R), and dead (D). Susceptible devices operate normally. They are not infected yet and thus susceptible to infection; they are the equivalent of healthy human populations that were never infected by the pathological agent in question and do not have antibodies or any immunity developed to it. Exposed devices are already compromised by malware but not yet fully expressing the infection; they are equivalent to human individuals whose infection is in a phase of incubation. In the case of computer malware, this state usually represents infected devices that are not actively reproducing the malware yet. Infected devices are carrying the malware and actively reproducing it. Recovered devices were recently cleaned and acquired some form of immunity, be it to the single type of malware that previously infected it or to a group of malware; this immunity can be both permanent or temporary. Dead devices are usually not operating at all, either because malware took it down or because the very network administration answered to the epidemic by shutting it down indefinitely (meaning that the time it takes to perform the device maintenance is bigger than the time window taken by the epidemic to fully develop).
All malware mathematical models are based on a mix of these common states, developing specific transitions between them according to the modeled network. Examples can be found in SIRS models [
22,
23], SEIRS models [
24,
25,
26], or SEIRD models [
27,
28]), to cite the most popular combinations. On top of them, innumerable variations introduce one or more new subdynamics, e.g., [
22], where an SIRS-L model was proposed to account for the low-energy mode of wireless sensors.
However, modeling just stages of malware infection can lead to models that fail to minimally represent the dynamics of IoT malware epidemics, due to the increased (and increasing) complexity of IoT networks. For example, instead of static network structures connected to the Internet through a single gateway, IoT networks have many fluid structures that can connect smartphones and other mobile devices intermittently, making it harder to isolate and control access to the network [
29]. Moreover, the diversity of software platforms on which Internet connectivity is built marks IoT networks as strongly heterogeneous compared to former networks [
30], increasing surfaces of attack. This complexity can be introduced in models by adding more parameters and elements to the transition between states, like the model of [
28] did to model heterogeneous IoT networks where malware spreads both via standard network infrastructure and device-to-device connections (Equations (3a)–(3e)):
where
is a constant birth rate and death rate of devices;
is the probability of successful patching,
is the probability of losing immunity;
and
are, respectively, the probability of a node exhausting power either naturally or due to malware;
is the probability of natural death of recovered devices;
and
are, respectively, the probability of transition of nodes from state
S to
E either by infrastructure network communication or by device-to-device communication, and
and
are equivalent for
E to
I.
But despite this main trend of research of proposing more realistic models by diversifying the modeled dynamics, the improvement of the main subdynamics of malware epidemics— the dynamics of random propagation of malware—has been largely ignored, remaining in the same simplified form since the Kermack and Kendrick model. This is our present focus: to improve the current dynamics of malware propagation into exact models of random propagation. To understand this research gap, we will review the modeling of random propagation separately.
Random Propagation of Malware: A Subcomponent of Malware Models
Although mathematical modeling of IoT malware theory has branched into important subfields and proposed complete models with valid choices of simplification, the dynamics of the propagation of malware is a subelement of modeling that has remained universally simplified since the seminal work of Kermack and Kendrick. Here, “propagation dynamics” means the logic of selection of targets used by malware as well as the mathematical expression that describes the rate of change of susceptible populations into compromised states (be it infected, exposed, or others). This submodel can be found in virtually every model in the form of . Excluding models that develop very specific dynamics of propagation, due to highly constrained possibilities of contact between individuals, all models in the literature assume global random propagation dynamics. (On a note, random propagation dynamics is also universally assumed because it can be easily modified to include simple constraints in the possibilities of contact between individuals, working as a canonical form of propagation).
From all studies that assume global random propagation, the overwhelming majority use the Kermack and Kendrick submodel to represent it: according to our review, more than 95% of the literature. It is so widespread that it will be called the “standard nonlinear model” in the rest of this paper. The remaining 5% of models found in the literature use either the most simple model possible in the form of the linear expression , or do not represent random propagation at all. Below is a discussion of these two most common models for global random propagation of malware. They will be presented in discrete time to facilitate comparisons with our proposed discrete model later in the paper.
Linear Model [
9,
10]: In this model, randomness is abstracted as just the rate of infection
. The number of infections is proportional to the population of susceptible devices, and no consideration is given to the number of attacking devices.
where
is the number of devices infected this turn only, according to the linear model. It is different from both the population of infected devices
I and its variation
(the latter also depends on the number of newly cleaned devices this turn).
This model is very convenient to insert in complicated malware models because it yields closed-form solutions more easily, but the error of estimation of infected populations can reach
[
21]. This is due to the lack of consideration to the number of infected devices that are performing attacks.
Standard Nonlinear Model [
11,
12,
13,
14,
15]: In this model, randomness is modeled as the number of encounters between susceptible and infected devices [
11]. The expression of the linear model is multiplied by the percentage of infected devices in the network, yielding Equation (
5).
This model is more realistic because few infected devices will lead to few attempted infections during a turn, regardless of the availability of susceptible devices and the rate of infection. It also takes into consideration that infected devices can target another infected device (since the choice of target is random between all the network members). However, it does not consider the fact that random attacks are uncoordinated and can lead to devices attacking the same target. An important consequence of this is that the standard nonlinear model overestimates the number of infections. Refer to our previous work for a detailed consideration of the difference between the pseudo-random propagation present in this model and the true random propagation that considers both dynamics.
3. Random Propagation of Malware
3.1. Basic Definitions
Our proposed models are based on the following the assumptions:
Discrete model of time.
Fixed number of network devices.
Markov chain assumptions for propagation dynamics.
No local constraints to global propagation.
Discrete models will be used throughout this paper, both for the proposed models of propagation and the complete malware models that utilize them. Time is defined as discrete turns, changing incrementally from one to . Moreover, the modeled network model has a fixed number of devices N and is isolated from exterior contact. The propagation of malware is considered to start from an initial number of already-infected devices.
All propagation models will be derived as Markov chains, which only depend on the current state to define the next. As a side note, only the dynamics of malware propagation need to satisfy Markov chain assumptions, not the entire malware model. Examples will be shown in the second and third malware models studied in this paper. The last assumption involves the necessity of global reach and no bias towards target selection by attacking devices, i.e., no local constraints.
As long as these assumptions are satisfied, our model of random propagation and its methodology of derivation are applicable to any malware model that spreads randomly.
Three different systems of network + malware were studied. Each had a different combination of compartmental states: (1) the simplest compartmental epidemic model susceptible–infected–susceptible (SIS); (2) an extended model susceptible–exposed–infected–recovered–susceptible (SEIRS) with time delay dynamics; and (3) a heterogeneous network with different behavior regarding malware infection, which was modeled as a double susceptible–exposed–infected–susceptible model and named 2SEIS.
In order to derive the complete dynamics of an epidemic system, two terms are necessary: a propagation term,
, which represents the rate of transition of susceptible devices into a compromised state (which will vary depending on the compartmental model), and a mitigation term,
, which represents the rate of recovery of infected devices into a susceptible or recovering state, depending on the model. The term
will vary along this paper, but for the sake of simplicity, the same mitigation term will be assumed for all models: a simple, standard form of mitigation with network-wide scans that detect infections with
detection rate and clean the detected infections. Therefore, the term for this mitigation of malware device-by-device is given in Equation (
6):
Once
is derived, it can be composed with
to build the vector
(Equation (
1)).
3.2. Canonical States of Random Propagation
In our previous work, we described in detail how to properly model the propagation of malware by random attacks, accounting for the disparity between the number of attempted infections and actual infections (equal or smaller, due to the possibility of failed attempts). We identified two sources of inefficiency during attempted infections by random uncoordinated attacks: repetition of targets by different attacking devices and attacks performed on already-infected devices. According to these possibilities, we defined attacks as efficient and wasted and defined a maximum of four types of infection attacks, summarized below:
Attacking a susceptible device for the first time (efficient).
Making a concomitant attack on a susceptible device when the other attack failed (efficient).
Making a concomitant attack on a susceptible device when the other attack is successful (wasted).
Attacking another infected device (wasted).
For modeling purposes, this can be translated as existing only one type of attacker (infected devices), but four types of targets (one for each of the scenarios above). They are represented in
Figure 1.
These four types of targets are equivalent to different states that a target device can display when it is chosen for an attack. However, the different number of states in various epidemic models creates a dilemma in directly representing the four scenarios of random attack. For instance, models with fewer than four states, such as Equations (2a)–(2c), cannot fully track the changes in all four target populations. On the other hand, models with more than four states, such as Equations (3a)–(3e), may raise questions about whether the four target populations described in the previous subsection can accurately represent malware propagation in such a system. But as pointed out, the modeling of malware propagation is just a sub-aspect of modeling. The standard nonlinear model is reused by very different epidemic models, but nonetheless still holds the same level of precision, so far considered satisfactory by the literature. All this means that the four scenarios of attack represent abstract states related to the phenomenon of propagation and are independent of other modeling aspects. Therefore, these states will be called hereafter the “canonical states of random propagation”, since they significantly facilitate the derivation of mathematical expressions for the rate of change of random infection, regardless of the number of system states or their relationship. These canonical states are described in
Table 2:
Together they represent all devices in a network, being another way of representing all populations of devices, as Equation (
7) points out.
where
are susceptible devices attacked for the first time (victim of attack Type 1);
are susceptible devices in the process of concomitant attacks, but not infected yet (victim of attack Type 2);
are susceptible devices in the process of concomitant attacks of which one already succeeded (victim of attack Type 3); and
are already-infected devices that are also attacking the network (victim of attack Type 4).
also represents any device whose state cannot change after a malware attack for any given reason (for example, permanently immunized devices or devices inaccessible to malware).
Our use of the word “canonical” is loosely borrowed from linear algebra: given a coordinate vector space, every vector inside it is described by a linear combination of a set of linearly independent vectors, called the basis of the vector space. There are infinitely possible bases for vector spaces, but one set of vectors is the simplest possible: vectors with unitary components in every perpendicular, linearly independent dimension. For example, given the
vector space, its canonical basis is the set of vectors
below:
The advantage of canonical bases is that the representation of vectors in the space becomes simple and intuitive. There is no further simplification than the usage of the canonical basis to represent vectors. Drawing an analogy to this, the canonical states of random propagation are also the simplest way to divide the total population of devices N into intuitive states that facilitate the derivation of the dynamics of random propagation. Another similarity with linear algebra is the convenience of transforming one reference to the canonical form to perform easier calculations and then transforming back if the results are needed in the original reference. When calculating the exponential of matrixes, for example, it is common to diagonalize the matrix and calculate only the exponential of the diagonal elements, then transform the matrix back into the original form. In the same way, epidemic states can be transformed into canonical states of random propagation to simplify the derivation of random propagation dynamics and then are transformed back into epidemic states. As will be shown, systems with less than four states need to divide one condensed state into two or more canonical states, while systems with more than four states need to group their additional states into one or more of the four canonical states and proceed the derivation of random propagation dynamics using our methodology.
A final advantage of the canonical states is that, similarly to what is carried out with the standard nonlinear model, usually it is not necessary to repeat the entire derivation procedure, which is rather burdensome. Instead, the expression of random propagation can be modified intuitively to include more states and adapted to fit the complete system dynamics. Examples will be given in the malware epidemic models of this paper.
In the sequence, the random propagation dynamics will be derived and inserted in the complete system dynamics of three epidemic models, demonstrating the flexibility of using our model. In later sections, all models will be validated by comparing them with an equivalent simulation.
6. Validation of Proposed Model
In order to demonstrate the validity of our model and compare its performance with the main model of literature, we developed a stochastic simulation that performs infections and mitigation actions device by device. Setting the same parameters for the Markov chain models and the simulation, the results of the Markov chain and the simulation were generated independently and compared. The simulation results are independent of the Markov chain because Equations (51a)–(53e) represent the epidemic states of entire populations of devices without keeping track of the state of individual devices. The simulation, however, stores the state of every device. Moreover, the infection and detection rates behave as deterministic parameters in the equation, but in the simulation, they are real random tests performed at every attempt of infection and cleaning. By running the simulation against a sufficiently big population, the stochastic variability of the simulation is minimized, and a fair, independent comparison becomes possible. The simulation code is available on a public GitHub repository (link at the end of the paper).
Figure 7,
Figure 8 and
Figure 9 show the results of the simulation of SIS, SEIRS, and 2SEIS. Although simulations were performed for many different parameters (see
Table 6), only the figures of the simulation for the high rate of infection will be shown since this is the scenario with the biggest degradation of performance for the standard nonlinear model. This and other results will be discussed later in this section.
6.1. Parameters
To nullify the variability of the stochastic simulation, a network model of IoT devices was used, at which size the variability of our random tests decreases to ∼0.1%. The network is isolated from external interactions, and the simulations start with a percentage of devices already infected from the beginning, which then propagate the malware to other devices. This initial percentage of infections is in all simulations.
Two important parameters are the rate of infection
and detection
. Their ratio approximates
, the “basic reproduction number”, which defines the behavior of the malware epidemic:
creates a scenario of natural eradication of malware;
creates a balance between infection and mitigation, leading to slow transitions; and
propagates malware aggressively until an endemic state with high percentages of infection.
The focus of our previous work was a mitigation strategy that compensated weak security in IoT networks, modeled as low detection rates of malware by the detection system. This goal motivated our choice of parameters in a different way when compared to the present study. We validated our propagation model for three different scenarios in SIS dynamics:
,
, and
, which defines the behavior of the malware epidemic as described above. We used the same low rate of infection
to emulate weak network security and varied
to generate each of these three
values according to Equation (
54). With this choice of parameters, our 2021 study [
32] proposed a network-level mitigation strategy for constrained networks, while our 2023 study [
21] extended it to global networks.
In this work, we focused on the generalization of derivation of the propagation dynamics and on the investigation of its improvement over the standard nonlinear model. One of the main objectives was to identify the maximum errors of both models. Since the previous work limited to , this time we varied it to three values representing weak, average, and aggressive rates of infection: a low value of , an average value of and a high value of . In the previous study, we identified that bigger errors occur when (aggressive rate of infection); therefore, this time, we fixed at and made for each of the three chosen values for described above.
Besides and , the SEIRS model has two additional parameters: the exposed delay and the recovery delay . The same value was used for them in all results: and . Although bigger delays are important to explore the behavior of the SEIRS model, the objective here is to investigate the difference in precision between the two propagation models (our proposed model and the standard nonlinear model). Since the measured variable is the infected population I, the most accentuated difference of precision will happen when the infected population reaches its highest value. However, when the delays are big, devices stay longer in the delay-related state, inflating their size and consequently decreasing the size of S and I. Therefore, the delays were kept at the very minimum to still have SEIRS behavior and create the biggest difference in precision between propagation models.
The 2SEIS model has only the infection delay but requires the previous definition of how many devices are CCTV and how many are mobile. of the network was set for each. Regarding , the same value of exposed delay was used.
6.2. Validation of Proposed Propagation Terms for SEIRS and 2SEIS
Table 6 presents four metrics to evaluate the accuracy of prediction of our proposed DTMC and the standard nonlinear one. The four metrics were calculated only for the curve of susceptible devices
S, which in all simulated scenarios has the biggest value of any population. Consequently, it displays the biggest percentage error. “Max Error of S” and “RMSE of S” compare the results of a DTMC with its equivalent simulation (SIS, SEIRS, or 2SEIS). The first metric is simply the maximum error between DTMC and simulation. The second one represents the root-mean-square error of the time series of
S between DTMC and simulation. Its calculation is presented in Equation (
55). “
]” represents the final value of the susceptible population of DTMC. At last, “
of S” represents the time constant of the
S time series. “Time constant” here is defined as how many turns it takes for
to reach
of its final value.
In addition to the table results, the time series of
S is shown for both propagation models and for the simulation in
Figure 10,
Figure 11 and
Figure 12, as well as the time series of error between DTMCs and simulation in
Figure 13,
Figure 14 and
Figure 15. Like the figures of the last subsection, results are shown only for the scenario of high
.
The first result to discuss is the accuracy of our proposed model and the inadequacy of the standard nonlinear model. The Max Error of
S in the latter ranges ∼3% for low
, ∼5% for average
and ∼8% for high
, with a maximum error over all cases of 9.88%. In our previous work, the maximum error was only 3.15% for the SIS model (mainly because a new mitigation strategy focused on low infection and detection rates was investigated). But it turns out that the error of the standard nonlinear model can go as high as ∼10%. On the other hand, our proposed model has the same range of error of ∼0.15% over all scenarios and models. This confirms our previous conclusion: such a small and constant error is just the variability of the stochastic simulation and is independent of parameters and models. The same conclusion can be taken from
Figure 13,
Figure 14 and
Figure 15, where the error of
S for the standard model increases as the turn passes and the infections grow, while the error for our proposed model oscillates very slightly around 0%. Moreover, the RMSE of our model is extremely low (∼0.05%); meanwhile, the standard nonlinear model has RMSEs as high as 9.33%. These quantitative results clearly demonstrate that our model is an exact match for the phenomenon of random propagation, independently of parameters or epidemic models.
Conceptually, such differences between propagation models can be explained by actual dynamics the standard nonlinear model represent exactly: pseudo-random propagation dynamics with only one inefficiency mechanism, the malware targeting of already-infected devices. The repetition of targets is not included in the small expression of . More details of this conclusion can be found in the previous work, where they demonstrated this exact match for the standard model with simulations.
The two last metrics showed the qualitative differences between the two compared propagation models.
of the standard model is 50% for all scenarios, but in our exact model, it is higher than 50% and increases with
. This is an important conceptual difference, especially in light of our mitigation strategy proposed in the previous work. It has a trade-off between the number of nondetected infections being cleaned in the group mitigation and the unnecessary cleaning (generating operation downtime) of already clean devices. Our control policy is then based on calculating the number of blind cleanings to match how much percent of the network is infected. The reason for the difference in
between standard and proposed models can be understood by checking the dynamics of
S in the equilibrium state for both models. Taking the 2SEIS example, the expression for
is given in Equation (
56). It is independent of
I, being simply the inverse of
times
N.
of our proposed model, however, depends not only on
but also on the binomial (Equation (
57)). Together with the numerical simulation, this mathematical result points to the overestimation of malware propagation in the standard model.
The last metric () does not show any significant difference between these two models (however, it does when compared with the linear model, as shown in our last work). It is presented here anyway to add the small observation of how many turns it takes for the malware epidemic to stabilize for different ranges of . Considering how almost doubled in every change of scenario, of S almost shows a linear dependence. However, investigation of this point is left in future studies.
These results show that the standard nonlinear model, widespread in the literature, is not a good model for true random propagation of malware, and that our proposed model represents the exact dynamics of random propagation with the small trade-off of a slightly more complex expression (a binomial with six operations against a multiplication with three operations). Being both models nonlinear, the advantages of using the standard model instead of ours seem very small.
7. Conclusions
In this paper, we presented the generalization of our exact model of random propagation of malware to any discrete malware model that makes Markov chain assumptions. We further developed an alternative methodology of derivation that applies two rules to modify the simplest form of the propagation dynamics (made for the SIS model) to any compartmental malware model, simplifying the derivation process.
To validate this generalization, we performed the two forms of derivation for three malware models—SIS, SEIRS, and 2SEIS—evaluated the equations with varying sets of parameters and compared their results with their corresponding simulation. By comparing the time-series progression of the network populations between the results of equations and simulations, we found stochastic errors of less than 0.2%. This comparison was repeated for the standard nonlinear model (the most popular model in the literature), finding an average error of 5% and maximum error of 9.88% (almost two orders of magnitude).
This result confirms that our model is exact, and that errors in the standard nonlinear model can have a significant impact on predictability. Therefore, we highlight the advantage of substituting the standard nonlinear model with our proposed model. The only limitation to its use is the requirement of Markov chain assumptions, preventing its application in malware models with memory of the states S and I (models with memory of other states can still use our propagation dynamics, as seen in SEIRS).
In future works, we will present an investigation of the algebraic tractability of our propagation dynamics the in laws of control of automatic mitigation systems.