1. Introduction
Evolution is one of the central paradigms of contemporary biology. At present, there is no mechanism or biological phenomenon which can be explained outside of evolutionary thinking. Moreover, Darwin’s theory of evolution has had a deep impact on other disciplines such as computer science. The possibility of simulating Darwinian evolution on a computer has resulted in an entire family of optimization algorithms (e.g., genetic algorithms (GAs)) grouped under the common umbrella of evolutionary computation [
1]. Genetic algorithms are mainly oriented to solve optimization problems in economics, mathematics, geology, industry, social sciences, bioinformatics, etc. In some instances, GAs are also applied in biology to simulate evolution, for instance, the early stages of the evolution of the vertebrate eye [
2] or the evolution of enzymes [
3], i.e., proteins with catalytic function within cells or the pattern of stripes on the skin of zebras [
4] are successfully simulated with a genetic algorithm. However, in all these examples, the evolution of a population of individuals occurs in a “non-living environment” composed of water, light, air, etc. Of course, the organisms living in an environment modify their physical area. Within this framework, the selection of organisms that will give rise to a new generation of individuals rests on their fitness value. The fitness value confers to an individual a probability of selection based on the quality of the solution for which an individual stand. Individuals therefore represent solutions to a common problem: individuals have to conform to a non-living environment and, in particular, to one of the many environments on planet Earth. Consequently, and assuming some level of abstraction, it is possible to simulate biological evolution with a genetic algorithm.
However, what happens when the evolution of a population takes place in a “living environment?” As an example, a virus infects an animal, which is referred to as host, evolving into the host animal. In this scenario, the viruses evolve within the host, but also the animal holds evolutionary mechanisms directed to defend the host from the harmful virus. In the host, the evolutionary mechanisms involve the immune system. The immune system represents the organism’s defense against infectious organisms (e.g., virus, bacteria) through a sequence of steps known as immune response. Currently, in a similar way as GAs are inspired by organic evolution (i.e., Darwinian evolution), the artificial immune system (AIS) algorithms [
5,
6,
7,
8,
9,
10] are inspired by the immune system of vertebrates. These algorithms share some of the characteristics of the GAs, and they are currently used for solving practical problems [
11,
12].
In the field of biology, the coexistence of an individual, for example a virus, within another individual, for example a mouse, is known as “within-host dynamics” (WHD) [
13,
14]. In this work, we refer to the invader individual (e.g., virus, bacteria) as “guest”, setting the term “host” to the recipient individual (e.g., mouse). In this paper, we designed an evolutionary WHD simulation model in which the guest evolves inside a host. Consequently, a novel ingredient of the WHD model is that it includes guest evolution. Evolution of the guest is simulated with a GA, whereas the defense mechanisms of the host are simulated through AIS algorithms. Another original feature of the model lies in its methodology showing how the use of heuristic algorithms inspired by biology is an alternative to classical mathematical modeling. We believe there is an interest in studying how a guest evolves within a host which defends its own integrity through the immune system. Moreover, we think that the interest of this issue goes beyond biology and may be relevant in other fields, e.g., computer science.
The use of ordinary differential equations (ODEs) has, to date, been one of the most appropriate techniques for understanding the mechanisms that govern host–guest kinetics [
15], assuming, like in this paper, a pathogen guest. In the realm of quasi-species theory [
16] (i.e., a theory about Darwinian evolution in self-replicating organisms), ODEs have also been successfully applied, e.g., in the study of HIV (Human Immunodeficiency Virus) virus evolution [
17]. Nevertheless, the WHD term generally refers to a host–guest coexistence model in which guest evolution is not included. In the scope of the present work, other modeling techniques have been applied. For example in Reference [
18], their authors conceive a qualitative simulation model of a
phage (a virus, i.e. guest) that infects
Escherichia coli bacterium (i.e. host).
According to the model described in Reference [
15], the within-host dynamics can be modeled as an ODE system:
where
T is the number of healthy host cells susceptible to infection,
is the rate of infection,
V is the number of viruses,
r is a model parameter related with viral replication, and
is the mortality rate of infected cells.
In 2005, a paper [
19] on the simulated evolution of HIV within an infected host was published, studying the effects of mutations and recombination (crossover). However, the model was a GA which does not include the host simulation. The authors only studied the effects of a simulated stochastic processes on a living host, e.g., in an individual affected by the disease.
In the present work, the evolutionary WHD simulation model was inspired by a specific biological scenario and then applied to an elementary example in computer science. In the biological scenario in which we were inspired by, a virus (guest) infects and evolves within the organ of a mouse (host) which defends its integrity from infection. Afterwards, we applied the model in computer science, considering a hypothetical example in which a computer program that is initially helpful to a computer evolves into a harmful program. Therefore, no matter which scenario is considered in the simulation experiments, two algorithms that perform opposite roles are confronted: the GA evolves the guest within the host, while a host artificial immune system (AIS) algorithm defends the latter from the evolutionary proliferation of the guest. The study, modeling, and simulation of this situation is the main novelty of this work. The WHD model is based on GAs and AIS, because such algorithms are inspired by biology, and both are quite accurate in emulating the mechanisms present in the organisms. In addition, such algorithms are sufficiently studied and have numerous applications in computer science.
In
Section 2 of this paper, we present the WHD model description and software realization including several simulation experiments of virus (guest) evolution within a mouse (host) with immune system.
Section 2.2. explains the modeling and simulation of virus evolution, and
Section 2.3. is a review of the AIS algorithms we applied in this paper. Following,
Section 2.4. describes the architecture of the evolutionary WHD model—how the GAs and the AIS algorithms interact with each other, modeling the effect of the AIS on the evolution of viruses. Next, in
Section 3, we explore the application of the model in the design of a program that detects the evolution of a malicious program in the memory of a hypothetical computer. Finally,
Section 4 presents the entire results of the computer simulation experiments, and
Section 5 discusses the possible impact of this work together with the general limitations of the model, suggesting future directions of advancement.
2. An Evolutionary WHD Model Inspired by Virus Evolution inside a Mouse with Immune System
In order to find the inspiration to design the evolutionary WHD model, we resorted to the following biological scenario. We simulated the evolution of an elementary model of virus using a GA, whereas the simulation of a mouse immune system was carried out by means of AIS algorithms. We assumed that virus evolution takes place within an infected cell belonging to a mouse organ. Virus evolution experiments were conducted both in the absence and under the effects of the immune system (
Figure 1). The WHD model comprised a host (i.e., mouse) and a guest (i.e., virus) modelled as explained in the following section.
2.1. Host and Guest Modeling
The present model symbolically defines both the mouse (host) and the virus (guest) by means of a binary string:
with
x (0,1). The binary string is organized in four substrings, reserving the first three to the definition of the identity of the mouse or virus and the fourth to the genome. Therefore, in both the mouse and the virus, the first three substrings are the sequences reserved for the AIS algorithms, and once defined they cannot be modified. In the case of the virus, only the fourth sequence is assigned to the GA, being subjected to evolutionary changes, e.g., by mutation.
The first substring
is a label that allows the identification of the mouse or virus. A second substring
represents in the mouse the organ in which the infection occurs and in the virus the protein envelope surrounding its genome, i.e., RNA (Ribonucleic acid, is the viral genetic material). The third substring
is a recognition sequence for the immune system. In the biological realm, the third sequence would be simulating the epitope or antigenic determinant. This means that it is a binary sequence performing the role of identifier, which is recognized by antibodies (B-lymphocytes,
Figure 1), the histocompatibility antigens (T-lymphocytes,
Figure 1), etc. Finally, a fourth substring
models the genome, i.e., the DNA (Deoxyribonucleic acid, is the mouse genetic material) from the mouse cells or the viral RNA.
In the mouse model, we included 12 possible organs. In the simulation experiments, a random organ is chosen to be the one infected by the virus. Inside this organ, the evolution of the virus will be simulated with the GA. Each organ of the mouse has a specific binary string with some substrings common to all organs and other variables:
The string above shows the common substring
that identifies the mouse (4 bits), the variable substring
labeling the infected organ (4 bits), and the epitope string
which includes a common part and another variable (16 bits). Although the mouse genome is not used in simulation experiments, as it does not undergo evolutionary changes, for formal reasons, it has been included in the model (64 bits):
In this paper, we simulated an RNA virus. In the virus model, the first three substrings (virus identification, protein envelope, epitope) were set to the values shown below:
The genome of the virus (i.e., RNA) was set-up as a zero sequence except for three 4 bit substrings representing the three genes of interest (64 bits):
In the present WHD model, for both mouse and virus, the first three strings with a fixed setting play an immunological role and, therefore, do not change, e.g., through mutations. Thus, these strings are not subject to evolution. Consequently, the evolution of the virus via GA within the mouse will take place because of GA operations on the viral genome or RNA whose string is shown above.
2.2. An Evolutionary Algorithm for Viral Cycle Simulation
Genetic algorithms are stochastic optimization methods inspired by the principles of natural selection [
1] established by Darwin’s evolutionary theory in 1859. At present, they are used in optimization and search problems, being trial–error methods in which the space of solutions is continuously explored and evaluated until an optimum is found (global or local). A simple GA comprises the following steps. First, (a) a set of solutions (individuals) is randomly generated. Then, (b) the goodness or validity of the solutions (adaptation to the environment) is evaluated, i.e., their fitness. Those solutions with a higher fitness value will have a greater probability of surviving and, therefore, of reproducing, i.e., of surviving to the next generation (gene propagation). The solutions that go to the next generation will be modified because of genetic mechanisms, in particular, mutations and/or genetic crossover which are simulated in GAs. The search process is repeated until an optimal solution is found.
The RNA virus model was inspired by the HIV virus simulated in Reference [
19], and we modeled it with the binary string described in
Section 2.1. Accordingly, and to simplify the model, many of the biological features of the virus [
20,
21,
22,
23,
24,
25] were not included in the present paper.
Figure 2 shows a simple GA adapted to the evolutionary dynamics of a virus. The model assumes that once a virus enters (E) inside a host cell, evolution occurs within the cell, in the absence of the release step (R). Initially (
t = 0), we generated a population of random virions, that is, a population of binary substrings (64 bits) modeling the viral RNA. Note that, as indicated in the previous section, the first three substrings with immunological significance were set in advance with arbitrary values. These lists were common to all viruses and were not subject to any evolutionary change. Next, we simulated the replication (R) and selection (S) of the virions that would pass to the next generation. The RS step involved the evaluation of the fitness of each viral particle. Once the virus population was assessed, the selection was carried out by the wheel parent selection method [
1,
3]. According to this method, all virions of the population become parental individuals. The probability of being selected depends on their fitness. In a roulette wheel, we define as many slots as virions have population, playing each virion to a slot whose surface is proportional to its fitness. The roulette wheel spin is simulated by calculating a probability value
p(
i), where
i is the RNA string number or its position in the viral population:
where
Fi is the fitness of a viral particle
i and
the total fitness of the population. Finally, we get a random number
Ui being
. The first virion or RNA string whose
p(
i) is greater than or equal to the random number
Ui will be reproduced once passing to the next generation. The procedure is repeated until a complete population of virions is obtained, maintaining a constant population size.
The fitness of the virions is calculated according to the criterion introduced in Reference [
19]. In the simulation experiments, we defined a target genome or RNA including three genes or binary strings with sequence values: G
1 = {1,1,0,0}, G
2 = {1,0,1,0}, and G
3 = {1,0,0,1}. Thus, we assumed that the rest of the RNA string could hold any sequence, being the sequences of the three genes those that significantly contributed to virus fitness and, therefore, to viral evolution. Following, for each virion
i and for the three regarded genes, the Hamming distance
was calculated between the RNA sequence of the virion
and the RNA target sequence
:
Applying the criterion [
19], the fitness value of virion
i is given by the number of genes whose sequence matches the RNA target sequence. That is, the fitness
Fi will be 0.33, 0.67 or 1.00 if one of the three genes, any two genes or the three genes simultaneously exhibit RNA target sequences:
Note that the above scores are multiplied by a weight (
w) value. Once the next generation of virions is obtained, the mutation (M) is applied, being simulated by the flip a bit technique [
1,
3]. In the GA, we did not include recombination or crossover. Since viruses are not released to the exterior (R) of the cell, the evolutionary loop is repeated until some convergence criterion is reached. The evolution of the viral population is confined inside the cell of a mouse organ, being evaluated its evolution by means of a performance graph (average fitness per generation).
2.3. Mouse Infection Model
The host subject of the modeling is a mouse, representing it symbolically according to the binary string defined in
Section 2.1. The mouse has an immune system for which a simulation is carried out via a family of algorithms grouped under the common term of AIS [
7,
8,
9]. This immune system provides the mouse with the ability to detect any external guest, e.g., virus. In immunology, virus detection is accomplished according two possible paradigms: either through intruder recognition (humoral immune response) or by the disturbances that intruder causes within the mouse (theory of danger). Next, we review the AIS algorithms that we have implemented and customized to the mouse model.
2.3.1. Thymus Model and Simulation: Negative Selection Algorithm
The algorithm simulates the function performed by the thymus, a small gland located between the lungs. Lymphocytes (i.e., cells responsible for fighting against external infectious intruders such as virus, bacteria, and including cancer cells) travel through the bone marrow (i.e., a tissue that fills the bones) to the thymus. Once there, the lymphocytes become mature T-cells which are capable of binding to external intruders, the latter being referred to as antigens (Ag). Antigens have a bond site termed an epitope. A particular class of lymphocytes known as B-cells secrete proteins called immunoglobulins or antibodies (Ab). These antibodies also have a binding site called a paratope. The function of antibodies is to recognize the antigens, binding to them in an antigen–antibody (Ag–Ab) reaction. As a result of this reaction, the antigen (i.e., the virus) is neutralized. In the present model, we simulated the Ag epitope with the
substring and the Ab paratope through its appropriate substring
. The Ag–Ab reaction was simulated as usual in this class of models based on binary sequences [
26,
27].
Lymphocytes must be able to bind only to antigens and not to the host organism itself (e.g., to the mouse organs) which is achieved by means of a negative selection algorithm [
5,
6,
8]. In the present model, the algorithm first generates random binary strings
modeling Ab paratopes in B-cells. Paratope strings have equal lengths (16 bits) as the third substring in mouse organs modelling histocompatibility antigens. Next, we calculated the Hamming distance,
, between the binary string
in the paratopes of B-cells and the strings
that represent the histocompatibility antigen of each organ:
The goal was to generate strings (Ab) that did not recognize the strings representing the mouse organs. In the model, and in this paper, we applied the key-lock principle and, therefore, there was recognition that when in one position of a binary string there is a 0 and there is a 1 in the homologous position of the other string or vice versa, i.e., 1 and 0. Therefore, the Hamming distance between each string and the string of each organ must be minimal, summing the Hamming distances into a total Hamming
DS distance:
If the total Hamming distance was less than a given self-recognition threshold
, then we selected the resulting string:
Since in the mouse model there are 12 organs defined (
O = 1, …, 12) and assuming as a criterion a maximum Hamming distance per organ of two units, we set a recognition threshold value equal to 24. The following boxes (
Appendix A,
Box A1 and
Box A2) show the pseudocode of the described algorithm. Applying
N times the negative selection algorithm, we obtained a population of
N lymphocytes composed of T-cells. Following, the lymphocyte population is stored (
Appendix A,
Box A2).
2.3.2. Humoral Immune Response: Clonal Selection Algorithm
According to the theory of clonal selection introduced by Burnet in 1959 [
8], only those lymphocytes capable of recognizing an antigen (e.g., virus) are selected and multiplied to the detriment of the remaining lymphocytes. In other words, there is a clonal expansion of lymphocytes, in particular B-cells, increasing numerically those B-cells with greater affinity for the antigens. In addition, such cells will gradually increase their affinity for antigens. Finally, once the clonal expansion is finished, some B-cells will become memory cells allowing the acquisition of immunity against future intrusions of similar antigens. In addition, other B-cells will secrete antibodies that neutralize (i.e., Ag–Ab) the antigen, i.e., virus.
The clonal selection algorithm [
28,
29,
30,
31] comprises three main steps: Ag–Ab recognition, clonal selection, and hypermutation (
Box A6). Firstly, the affinity between the binary string
modeling the Ab paratope carried by
i B-cell (
j is the bit position in the string) and Ag epitope
(i.e., the virus) is calculated by the Hamming distance. Based on this distance, we calculated the fitness of each B-cell being
k a proportionality constant:
In second place, the clonal selection was simulated by applying the algorithm of the wheel parents selection described above (
Section 2.2). Third, and last, hypermutation was simulated. This mechanism occurs during clonal expansion by modifying through mutation the genes that encode the antibodies present in B-cells. The mutation is greater when the affinity between the Ag and Ab is lower. In other words, the higher the fitness value of a B-cell, the lower the mutation probability in the Ab carried by the B-cell. The Ab mutation rate
is calculated by the following function:
where
is a parameter for the exponential decay setting. If a B-cell is chosen to mutate, whether a bit experiment or not, mutation in its Ab is decided according to a given hypermutation rate. The clonal selection algorithm, together with the negative selection algorithm, allow us to consider the lymphocytes (i.e., B-cells and T-cells) as natural classifiers. The pseudocode of the described algorithms is depicted in
Box A3,
Box A4,
Box A5,
Box A6 and
Box A7 (
Appendix A).
2.3.3. Theory of Danger: The Dendritic Cell Algorithm
In 1994, Matzinger [
8] proposed the theory of danger according to which the immune system is not triggered by the detection of, for example, a virus as it is the case with the humoral immune response paradigm. Once a virus enters into the mouse and its cells are damaged, a series of signals called danger signals are produced in response to the virus. From a biological point of view, the central idea is that only harmful intruders produce cell necrosis, i.e., the pathological death of the cell due to the fact of serious injury. We let the virus enter into the mouse and, according to the damages suffered by the mouse cells, it is decided whether or not to trigger the humoral immune response (algorithm of clonal selection,
Section 2.3.2).
The theory of danger was the inspiration for the dendritic cell algorithm (DCA) introduced by References [
32,
33]. A particular type of cell known as a dendritic cell (DC) is capable of receiving signals from both the inside and outside of the mouse cells. Dendritic cells are in three possible states: immature DC, semi-mature DC, or mature DC. Under normal conditions, the DCs are in an immature state monitoring the mouse cells. Therefore, a DC acts as a processor that receives input signals, processes them and sends output signals. Input signals can be of three types: pathogen-associated molecular patterns (PAMP) (PS), danger (DS), and safe (SS) signals. The first input signal reports the presence of harmful intruders (i.e., virus), and the second input signal notifies of damage in the mouse cells infected by the virus. Finally, the third input signal announces that the mouse cell is healthy (i.e., the possible alterations present in the cell are regarded as being within the normal limits of the cell). According to References [
32,
33], the inflammation mechanism (IC) has an amplification effect on the received signals. Once the input signals are processed and whether there is inflammation or not, the output signals are obtained. Output signals can be of three classes: co-stimulatory signals (
Ocsm), semi-mature signals (
Osemi), and mature signals (
Omat). The first output signal is related with the migration of the DC to the lymphatic node (i.e., an organ of the mouse immune system). Once a DC reaches the lymphatic node, the clonal selection algorithm will be activated (see
Section 2.3.2). The second kind of output signal reports that the mouse cellular environment is “safe”. Finally, the third type of output indicates the opposite, i.e., that the mouse cellular environment is in “danger”.
The processing of information in a DC takes place according to the following algorithm [
34,
35,
36]. As long as the processing of the input signals leads to an
Ocsm output below a given threshold, the processing of the input signal continues. Once the threshold is reached, then the levels of the output signals
Osemi and
Omat are compared. If the
Osemi is greater than
Omat, then DC becomes semi-mature, i.e., the mouse cell is in a “safe” or “no danger” state. Otherwise, if
Osemi is lower than
Omat, then DC reaches the mature DC state, emigrating from the damaged cell to the lymph node. Once the DC cell arrives at the lymphatic node, it triggers the humoral immune response, i.e., the mouse cells are in a “danger” state. The activation of the immune system takes place via cytokines (i.e., small proteins that have effects on other cells) that activate the T-cells, in the present model, by turning on the clonal selection algorithm. In the standard dendritic algorithm [
32,
33], the described model is expressed in terms of the following expressions:
and setting the values of the weights
w11,
w12, …,
w33 according to the problem on which the algorithm is applied. The pseudocode of the described algorithm is shown in
Box A8 (
Appendix A).
2.4. WHD Model Architecture
The simulation model (
Figure 3) relies on the hybridization of a simple genetic algorithm (virus evolution) with an artificial immune system (mouse defense system). Note how the artificial immune system model is a layered model as shown in
Figure 3. A simulation experiment starts with virus detection once the virus has entered within the mouse (
Figure 4). If the selected paradigm is the humoral immune response, then the virus detection algorithm is simulated with the clonal selection algorithm shown in
Box A9 (
Appendix A). In accordance with
Box A7, the clonal selection algorithm selects the optimal B-cell with maximum fitness and, therefore, a greater affinity with the virus. Afterwards, the Ag (virus)–Ab (B-cell) affinity is calculated with the Hamming distance. Based on the distance obtained, we evaluate whether it is equal or greater than a given affinity threshold value—10 in the present simulation experiments. In addition, if a predefined stop generation is reached (100 in the current experiments) then the algorithm simulates the Ag (virus)–Ab (B-cell) reaction.
However, in the case of the theory of danger paradigm, virus detection (
Figure 4) is held via a danger detection algorithm (
Box A10,
Appendix A). In this case, the dendritic cell algorithm (
Box A8) receives the input signals (PS, DS, and SS) and process the input by means of Equation (10). Signals are processed until the
Ocsm value reaches a threshold value. When this threshold value is reached, then the
Osemi and
Omat values are compared, and it is decided whether the DCs migrate from the mouse cell under surveillance to the lymphatic node.
Box A10 shows an elementary model of the PS, DS, and SS signals. In the viral infected mouse cells signals concentrations are updated for each generation. If the levels of PS or DS exceed predefined thresholds, then inflammation mechanism is triggered, setting the inflammation coefficient (IC) to a particular value.
Figure 5 and
Figure 6 show the coupling between the GA and AIS algorithms in the humoral immune response and theory of danger paradigms, respectively. Note how, in both cases, the hybridization among the two algorithms is in the Ag (virus)–Ab (B-cell) neutralization step. Neutralization reaction simulation was conducted as follows. We generated a random number
u,
, that was compared with a neutralization rate or Ag (virus)–Ab (B-cell) reaction probability for which the value
Nr was previously set:
If the B-cell successfully neutralizes the virus (), then the fitness Fi of the virus would be zero. Otherwise, if the virus is not effectively neutralized by B-cells (), then the viral fitness is calculated according to the Expression (4). Finally, it is evaluated whether the average fitness of the viral population is equal or less than a minimum value (minimum fitness average). In such a case, we assume that the virus is successfully neutralized by the immune system. Thus, it is presumed that, under an evolutionary point of view below , the virus is not viable. In such a case, the simulation experiment ends—the virus is removed from the mouse.
2.5. Simulation Experiments
First, we performed a series of control simulation experiments, i.e., we simulated the evolution of the virus via a genetic algorithm in the absence of an immune system that prevents its evolution. The mutation rate of the virus population was equal to 0.1 with the probability of a bit mutation equal to 0.3. The experiments were carried out with 80 virions studying their evolution during 2000 generations.
Secondly, we studied the evolution of the virus under the influence of the immune system. In the experiments with the immune system in place, the virus population size was also equal to 80 virions, and the mutation operator parameters were similar to those used in the control experiments. However, we only studied 200 generations after experimentally finding that it is an adequate number of generations. In this work, we conducted three different types of simulation experiments. First, we simulated the negative selection algorithm. Afterwards we simulated the two paradigms of virus detection: humoral detection and danger detection paradigms.
In the humoral detection simulation experiments, the clonal selection algorithm evolved a number of B-cells equal to the number of virus, i.e., 80, being the hypermutation rate equal to 0.8 and the parameter of exponential decay = 0.5 (Equation (9), for B-cell selection). Although the clonal selection algorithm evolved B-cells over 500 generations, in previous experiments we observed that the selection of the optimal B-cell was achieved in a smaller number of generations. Several simulation experiments were performed in order to study the effects of the neutralization rate Nr and the minimum value of the average fitness .
For experiments conducted under the theory of danger paradigm, clonal selection was simulated under same experimental conditions and parameters as described above in the humoral detection experiments. In Equation (10), the values of the weights were a set of values that are usual in the dendritic cell algorithm:
w11 = 2,
w12 = 1,
w13 = 2,
w21 =
w22 = 0,
w23 = 3 and
w31 = 2,
w32 = 1,
w33 = −3. The simulated signals model (
Box A10) was a simple model in which we updated only the PAMP (PS) signals, simulating only the infection by a microorganism. At each generation, PS was increased:
PS = 0.001 units. When the level of PS was above 0.5 units, the inflammatory mechanism was triggered, setting a value IC = 1.0.
3. Designing a Computer Algorithm for Detection of Emerging DOS Malware: A Simple Example of WHD Model Application
In order to illustrate how the WHD model described in this paper can be used in computer science, we conducted the following experiment. In the example, we assumed the existence of DOS (Disk Operating System) programs with the ability to evolve within a computer assuming that as a result of this evolution a program that was initially useful to the computer can be transformed into a computer virus, malware, etc. [
37]. We refer to these programs as “DOS malware”.
Suppose a hypothetical computer (
Figure 7) in its most elementary configuration with von Neumann architecture and that we label with the string
. Therefore, the computer comprises a memory unit and a processor that we labelled as
and
, respectively. In order to simplify the model as much as possible, we did not include the control unit, the arithmetic logic unit (ALU), accumulators, etc. In the computer memory there was a DOS program consisting of two code segments, chosen from a set of 16 possible code segments.
Table 1 shows 16 code segments labelled with a 4-bit string and their fitness value. The fitness scores or values were assigned taking into account that a fitness value is higher the greater the maliciousness of the code. Therefore, codes 1010, 1011, 1100, 1101 and 1110 have a low fitness value, since they do not threaten the computer integrity, performing useful or leisure tasks. In contrast, codes 0010 and 0011 will cause serious damage to the computer. In the experiment, we assumed that initially in the memory of the computer there was a DOS utility program, its code being the result of the merging of segments 1100 and 1110. Running the file on the computer displays the current date and time, followed by the browser opening and loading a search engine page. We generated an initial population of 80 similar 11001110 programs. At any given moment, the DOS programs evolved by natural selection in the memory
of the computer, transforming into DOS malware. We assumed for the simplicity of the experiment that once the DOS program transformed into a malicious code, the resulting computer virus was not a polymorphic code that was capable of changing to evade detection by the computer, for example, from an antivirus or any other detection system.
In the simulation experiments, the programs evolved according to a simple GA. Each program was represented by a vector or chromosome of 8 bits being coded by two segments of the code in
Table 1. The algorithm used a mutation operator as a variability source, choosing the population and bit mutation probabilities equal to 0.08 and 0.04, respectively. The fitness
Fi of an individual
i of the population (i.e., a DOS program) was obtained according to the following expression:
with
and
being the score received for each code segment (
Table 1) and
w1 and
w2, respectively, the weight values (
w1,
w2 = 10 in the experiments). Once the DOS programs were evaluated, the selection of those that would pass to the next generation was conducted by the wheel parent selection method [
1,
3] (see
Section 2.2). Eventually the population of programs converged to a program consisting of 0010 and 0011 segments. From a theoretical point of view, we considered the software, in our case the DOS program, as another computer component. For this reason, we initially assigned a 0000 label to the program 11001110. However, once this program evolved into malicious software, its label changed from
to
, denoting that it was a harmful code and not a useful component of the computer. Notice how, from an immunological point of view and according to the nomenclature described in
Figure 4, that this was an experiment in which a
self-program was transformed into a
nonself program. An example of this type of change is the transformation of healthy cells into cancerous cells [
38].
In this work, we performed two kinds of experiments: one in which the DOS program evolved in the absence of an algorithm that opposed their evolution and another in which the algorithm that was against the program’s evolution was present in the computer. In the second case, the evolutionary algorithm that evolved the DOS program was confronted with an artificial immune system algorithm that expressed the computer’s identity (
self). We named the program that implemented this second algorithm “tiny clonal antivirus” (TCA). First, TCA detected the presence of a malignant code (
nonself program) labeled
, applying the humoral immune response paradigm (
Figure 4). The detection was carried out by measuring the Hamming distance between the intruder program labeled
and the computer labeled
. Since the distance was a maximum of (H = 4), the stop signal stop[1] = 0 changed to the state stop[1] = 1 (positive detection). Secondly, the change of state of the stop signal triggered the clonal selection algorithm or sCLONAL(), neutralizing the DOS malicious programs. The parameters of the sCLONAL() function were similar to those used in
Section 2.5. The function nonself-self() was tailored to the present problem. In consequence, the change of state of the signal stop[0] = 0 to stop[0] = 1 (positive infection) occurs when the paratope sequence in the B-cell has maximum affinity (maximum Hamming distance) with the epitope of a malicious program, i.e., 0010 or 0011.
4. Results
In general, we conclude that the simulation experiments reasonably captured the evolutionary dynamics of the guest (i.e., virus, DOS programs) within a host (i.e., mouse, computer) that fights against guest evolution. The hybridization of genetic algorithms with artificial immune system algorithms allows us to conceive evolutionary WHD models for simulating scenarios that, to date, has not been modelled with differential equations.
We therefore conclude that the proposed evolutionary WHD model adequately simulates the coexistence of two agents exhibiting both an evolutionary dynamic against each other.
4.1. Control Experiment
Figure 8 shows the evolution of the virus in the control experiment (i.e., in the absence of the immune system) such as, for example, of a virus within an immuno-depressed host. In this case, we obtained the classic performance graph of a genetic algorithm.
4.2. Thymus Model and Simulation: Negative Selection Algorithm
The simulation of the mouse thymus (i.e., the organ where the negative selection algorithm occurs) adequately yielded to mature lymphocytes, e.g., B-cells. Such cells displayed the feature that they do not bind to mouse organs. For example, in one of the simulation experiments, we obtained twelve B-cells: 0101010101001000, 0101010101000001, 0101010101001011, 0101010101000111, 0101010101001001, 0101010101000000, 0101010101000010, 0101010101000101, 0101010101000011, 0101010101000111, 0101010101000110, and 0101010101001001.
Although this algorithm was not used in later experiments, it was part of the current study, as it may be of interest to immunologists, bioinformatics, etc.
4.3. Humoral Detection and Immune Response
Figure 9 and
Figure 10 shows representative results obtained in the humoral immune response simulation experiments. In one of the experiments (
Figure 9) with a neutralization rate of 25% (
Nr = 0.25) and minimum fitness value
= 50, it was observed that the immune system was not successful and the virus was not removed from the host. Therefore, in this experiment, the virus evolved in a stepped way until reaching the optimal genome. However, in another experiment (
Figure 10), when the neutralization rate was high up to 99% (
Nr = 0.99) and the minimum fitness value was
= 50, the virus was successfully cleared from the mouse.
Figure 10 shows how the average fitness of the virus population declined to zero in generation 101. However, in
Figure 9, the Ag–Ab neutralization reaction that takes in generation 100 was unsuccessful. Virus average fitness fell in generation 101, drawing a valley in the performance graph of the virus evolution. However, once the neutralization reaction was over, the virus population recovered and continued its evolution.
The minimum fitness value has an effect on the dynamics of the virus, which depends on the neutralization rate Nr. Studying different values of and Nr, we found that given some value of , a successful clearance of the virus occurred when the neutralization rate was equal or greater than 95%.
4.4. Danger Detection and Immune Response
Figure 11 shows an illustrative result obtained in the experiment based on the theory of danger. Although the viral dynamics seemed similar to that obtained for the humoral system (
Figure 9 and
Figure 10), the underlying mechanism was more complicated, because it represents a higher-level response from the immune system. In this case, the virus was neutralized once the PAMP (PS) signal reached a given level in the host cells. In the experiment, the PS signal increased generation after generation to a certain extent. Signal increase will be the cause of the virus neutralization reaction, which takes place in generation 100 and virus clearance in the next generation, i.e., 101.
It is worth noting that when the virus is successfully eliminated in both paradigms (humoral detection and danger detection), the performance graph (
Figure 9,
Figure 10 and
Figure 11) resembles the numerical solution (
Figure 12) of the number of virus
V per generation obtained from the ordinary differential equations [
15].
4.5. DOS Malware Detection Simulation
Figure 13 shows the evolution of DOS programs into malicious DOS programs when the computer (
Figure 7) is not protected by the TCA algorithm. However, when the TCA program is present, the DOS malware is detected. Once detected, it is successfully removed from the computer memory (
Figure 14).
5. Discussion
The main goal of this paper was to hybridize a genetic algorithm with artificial immune system algorithms in order to obtain an environment in which two agents coexist and evolve against each other. Indeed, the coexistence of two agents evolving against each other is a theoretical situation in which it is possible to conduct a very large number of simulation experiments. Our goal was to design a general framework for the study of an evolutionary WHD model, and for this reason, we have not conducted all possible simulation experiments. This work therefore represents a first step in achieving this goal. In our opinion, our model shows how using bioinspired algorithms (i.e., evolutionary computing algorithms) we can simulate detailed mechanisms which it is not possible to simulate with ordinary differential equations. Among many other reasons, the parameters of a model stand for quantitative abstractions of a given phenomenon or mechanism, e.g., the differential equations model the viral mortality rate . In consequence, in ordinary differential equations models, many relevant details are not included, e.g., all immunological processes—algorithms responsible for viral clearance. Another novel feature of our approach is host modeling. The approach developed in this paper allows us to simulate both the effects of the immune system and the evolutionary plasticity of the immune system as the virus evolves. In addition, with our approach, it is possible to simulate different guest detection mechanisms, immune system action levels, neutralization reaction, etc.
Many other simulation experiments could be conducted, for example, the simulation of the evolutionary mechanisms developed by the virus to attack the immune system of the mouse, destroying the host defenses. In addition, a replication mechanism inspired by RNApolymerase, an important enzymatic machinery responsible for self-replication of RNA viruses, could have been included in the present model. Of course, all these ideas will be of interest to virologists, immunologists, and bioinformaticians interested in the study via simulation of the host–parasite interaction. However, beyond a possible clinical application of the proposed model, our model could also be of interest in other areas such as computer science, for instance, in the design of anti-virus software such as intrusion detection algorithms in a corporation’s computer systems, especially in those cases where the invader exhibits evolutionary changes. In this paper, we simulated an experiment in which a DOS program that is initially useful to a computer evolves into DOS malware. Furthermore, the example illustrates how to encode the problem under study within the methodology we have developed. In general, the coding of the problem consists of tailoring the genetic algorithm that simulates the evolution of the guest and customizing the AIS routines [
39], written in Python, to the problem.
As the complexity of communication networks increases, computers and communications systems will require intelligent systems capable of detecting malware with evolutionary traits. Therefore, a possible strategy will be to foresee these scenarios by simulating such environments by the hybridization between genetic algorithms and artificial immune systems. That is, and because of this hybridization, the integration of AIS in the host explains its detrimental effects on the evolution and neutralization of the guest. In this paper, the hybridization technique (
Section 2.4) between the GA and the AIS is the most natural, just as it happens in biology, considering the bio-inspired flavor of the model. Note how the guest detection, the immunological paradigm (i.e., humoral immune response and theory of danger), and neutralization reaction are the most relevant steps for linking both algorithms. In fact, but with a different purpose, the implementation of the hybrid AIS–GA is not a novel approach. Nevertheless, there are previous publications in which their hybridization is addressed, but the main goal was to increase the individual efficiency of both algorithms by collaborating in a common optimization problem [
40,
41,
42]. For example, Reference [
43] found that, depending on the benchmark function, sometimes the clonal selection algorithm provides a better performance than genetic algorithms, but in other cases the genetic algorithm is the one with the best performance. In our model, the novel feature is that these two kinds of algorithms do not collaborate, but they compete with each other. Thus, the evolution and increase of fitness in the virus (guest) is at the expense of decreasing the fitness of the host—the mouse. In a similar way, and by adopting an evolutionary perspective, we could think that it happens with a computer virus (guest) and an infected computer (host). However, our model does not include the calculation of the host fitness or a mathematical model calculating host fitness variation in function of the guest fitness. As future work, we propose the designing of an antivirus algorithm based on artificial immune system [
44,
45] but including the evaluation of the computer virus danger (increased guest fitness) depending on the damage that it would cause in the computer (decreased host fitness).
In order to make this paper readable, and as we have already mentioned before, we have not included a detailed study of the parameter values’ effect on the simulation results. We have not studied all the possible values of the weights in Expression (10) of the model that implements the dendritic cell algorithm. In addition, it would be interesting to study the different models of PS, DS, and SS signals and include a mathematical model with enough detail of the inflammatory mechanism. Since there is an equivalence between biological concepts and their counterparts in computer science, we plan as future work to address the above issues, as they are fundamental in the design of intrusion detection systems. Nevertheless, in both the GA and the AIS algorithms, the most suitable values of many other parameters are known from preliminary simulation experiments. In fact, some parameters are common to both kinds of algorithms: GAs and AIS share some procedures, for example, mutation and hypermutation. In fact, these two kinds of algorithms belong to related families and are used in related problems. Moreover, GAs and AIS speak the same computing language by processing with similar mechanisms strings of symbols.
Obviously, the model of virus–mouse coexistence introduced in this paper is a metaphor that could help design bioinspired solutions in other scientific and technological fields. We are referring to those scenarios in which two agents coexist and evolve simultaneously against each other. Not only in biology or computer science does this situation arise, but economics and politics are also suitable scenarios for the application of our model.