1. Introduction
Software defined networking (SDN) is a dynamic, adaptable, and manageable paradigm that facilitates innovations in computer networks [
1], together with the prototyping and deployment of flexible routing mechanisms [
2,
3]. Traditional networking is based on manual configurations of distributed proprietary network devices, a cumbersome and error-prone process that can underutilize network resources [
4]. SDN offers a programmable architecture where routing decisions are moved to centralized controllers. SDN data plane switches are simple forwarding devices that forward the data traffic depending on the controller’s flow forwarding rules. Routing algorithms are implemented and communicated by each SDN controller to the SDN switches, which follow its instructions. Metrics such as hub count, delay, packet loss, bandwidth, jitter, and power consumption can be measured by SDN switches and sent to the controllers, which may use these metrics to determine the best routing paths and then install the flow forwarding rules in the data plane SDN switches. Indeed, the IoT [
5] interacting Cloud Services [
6] for the decision and control of the cyber-physical world create challenges for networks that achieve a better quality of service (QoS) and security and less energy consumption, and can exploit the opportunities offered by machine learning [
7,
8,
9]. These challenges can be met by SDN networks [
9,
10], which offer greater flexibility and ease of implementation [
2,
11].
These developments suggest that SDN is likely to become the preferred networking approach not only in core networks because of the centralized network intelligence and management that it enables but also for sub-networks of IoT devices and edge devices with specific QoS needs that can benefit from SDN programmability and flexibility. Thus, in [
12], conventional routing protocols such as RIP, OSPF, EIGRP, and BGP, are compared with SDN with respect to convergence times after link failures, showing that SDN routing is better than conventional IP networks. Considerable work has also shown that SDN can select routing paths based on criteria such as quality of service (QoS) [
13,
14,
15,
16], while energy-aware SDN routing has also been discussed in several papers [
17,
18,
19].
The scalability of SDN routers that conduct QoS routing has been studied in [
20], where the authors propose an SDN-based scalable QoS routing scheme between autonomous systems. In [
21], a survey of the scalability issues that arise when SDN’s centralized scheme deals with relatively frequent path updates is conducted. Both hierarchical and concurrent (distributed) approaches are investigated to alleviate SDN controllers’ additional workload. In recent work, [
22] SDN is discussed as a means to choose the best paths based on a function of time-varying traffic in order to optimize the QoS metrics of interest. Other work [
23] examines a broad class of QoS-based algorithms to assign paths to flows in SDN and analyzes the resulting performance. In addition, the work in [
24] discusses the implementation of SDN based real-time QoS in industrial settings with mobile robots or palets, where motion and reliability requirements impose changes in paths to constantly meet real-time requirements. In [
25], the use of AI-driven dynamic QoS routing in SDN is used to optimize QoS, reduce energy consumption, and improve security based on Autonomic Communications [
7] and the Cognitive Packet Network algorithm [
26].
However, in addition to scalability issues, QoS-driven SDN routing can create traffic and time-dependent changes in network topology and in the load and paths that are serviced by SDN switches. SDN network performance has been analyzed using queueing theory [
27,
28,
29,
30] and network calculus [
31,
32,
33], but these performance evaluations are based on the assumption that the network is in steady state—i.e., after a sufficiently long time—so that network metrics such as queueing delays, the length of packet queue buffers of SDN switches, and packet losses become stable (or time-independent). On the other hand, it is important to understand the time-dependent behaviour of SDN switches affected by changes in paths notified by the SDN controller. The controller can suddenly change the flows that an SDN switch receives, changing its input traffic. Furthermore, for a given switch some flows may be moved from one output port to another to comply with the new path that they must follow. These sudden changes will have performance consequences, including queueing delays and packet losses, which can only be understood via time-dependent transient analysis.
Unfortunately, conventional queueing network models are difficult to use in the transient regime because of the computational burden associated with their analysis; indeed, analyzing transients even in a simple single-server system with Poisson arrivals and exponential service times leads to the use of Bessel function expansions, and interconnected systems are quite hard to analyze in the transient case [
34,
35,
36]. The analytical solution is known only in the case of single queues with Poisson input stream and exponentially distributed service times; see [
37] for infinite and [
38,
39] for finite queues. The models use Markov chains and solve Chapman–Kolmogorov equations (first-order linear differential equations), defining the state probabilities of
n customers present in the system at time
t. The equations are solved analytically in the Laplace domain, and then the original functions in the time domain are found. Even in these relatively simple models, the solutions are quite complex—e.g., in the case of the infinite queue, the state probabilities are given in the form of the infinite series of modified Bessel functions of the first type and various order; the Bessel functions are themselves the infinite series. Some simplifications were proposed—e.g., the generating function of the distribution in the Laplace domain may be replaced by expressions with simpler original functions in the time domain [
40], or Bessel functions may be replaced by easier-to-compute functions [
41].
These analytical results do not fit well with the problem of modelling computer networks, where the streams incoming to switches are not Poisson and the sizes of packets—and therefore also the service times—are not exponentially distributed. We may introduce to Markov models interarrival and service times distributions composed of exponentially distributed phases—e.g., Cox distributions or hyper-Erlang distributions; the state definition is extended to include the current phase. There are numerous tools—e.g., [
42]—that can match a phase-type distribution to any empirical histogram. However, the initial number of states should be multiplied—in this case, by the number of phases. This substantially increases the number of equations to be solved numerically. The solution is usually obtained using an existing tool—e.g., [
43,
44]. We have applied this approach in modelling the transient states of an IP router, ref. [
45]; to represent the service time distribution, we needed a hyper-Erlang distribution with three parallel Erlang distributions with 21, 1387, and 2 phases. This could be conducted in the case of a single queue, as we are able to solve systems of millions of equations numerically, but it is hard to use this approach in modelling a network of switches. The typical method is to use either fluid flow approximation [
46] or diffusion approximation. The fluid flow approximation is much simpler, as it considers only the time-dependent mean values of flows, queues, and delays. However, its errors are much larger than those of diffusion approximation; see a comparison in [
47]. On the other hand, discrete event simulations of transients require many hundreds of independent repetitions of simulation runs to achieve a sufficient statistical accuracy, making the computation times of such simulations prohibitive [
48].
Thus, in this paper we extend the approach we developed in [
49] to the time-dependent analysis of multiple SDN switches using diffusion approximations [
50,
51], which are very convenient to analyze in a time-dependent regime. The accuracy of diffusion approximations has been validated in industry-based research over many decades [
52,
53,
54,
55], including for patented techniques [
26], and also validated in many academic papers [
56,
57,
58]. Their advantage includes a more accurate representation of interarrival and service processes, the ease of obtaining delay predictions from traffic measurements, and much faster numerics for transients than discrete queueing models [
59] or simulations. Thus, we compute the transient behaviour of each SDN switch after changes occur in its input traffic rate. Packet loss probabilities can also be computed even when they are “tiny” and impossible to estimate by conventional means. The analysis we undertake considers both single SDN switch and multiple interconnected SDN switches controlled by an SDN controller.
The rest of the article is organized as follows.
Section 2 presents the queueing model of an SDN switch, justifies its simplifications, and introduces the diffusion approach’s principles. In
Section 2.1, we give a detailed steady-state diffusion model of a single queueing station representing the SDN switch, and in
Section 2.2 we extend this to the transient-state model.
Section 3 presents a numerical analysis of the model from
Section 2.2. We choose various parameters of the traffic and determine a scenario for the traffic changes. The obtained results include queue distributions, mean queues, and loss probabilities at the switch as a function of time.
Section 4 presents the network model; the network may have any number of switches and any topology.
Section 5 studies the transient behaviour of a chosen network of several nodes. It also gives a simple example of how the model may predict queue evolution and minimize the total backlog at nodes during a time interval. Conclusions are given in
Section 6.
2. Diffusion Model of an SDN Switch
Figure 1 describes the basic system architecture of an SDN switch proposed in [
1]. Arriving packets are temporarily queued at the input buffers and are then removed by the Arbiter and placed scheduled into the Packet Buffer. A copy of the packet header is forwarded to the Parser. The Parser parses the packet header to extract the header fields and then creates a tuple with the extracted information and forwards it to the Flow Match Unit. In the Flow Match Unit, the tuple is matched against existing flow rules stored in Flow Tables’ flow entries. The flow entries in the Flow Tables are maintained under the controller’s guidance and are updated when the controller installs new flow rules. The Flow Match Unit determines whether the packet is associated with a known flow and hence a known path. In case of success, the packet is then forwarded via the backplane. In case of failure (no flow table entry matches the packet header), a packet-in message will be sent by the SDN switch to the corresponding SDN controller [
60] to notify the controller about the absence of a flow rule for the packet. The packet-in message contains either the packet or the packet’s ID. The controller decides the correct action for the packet and then installs appropriate data in the flow table of the switch so that packets belonging to that particular flow can be forwarded subsequently. If there is no corresponding response from the controller, the packet will be dropped.
The delay that can be experienced by packets consists of the queueing delay in the input buffer, the service delay in the input buffer, and the delay at the output buffers. When the output ports’ processing and line speeds are significantly greater than the Openflow processing time, which includes the time required to parse the packet, check the flow tables to find matching entries, and execute the flow rule actions, the switch can be represented by a single server queueing as in several recent papers [
27,
28,
29,
30,
61,
62,
63]. Since the size of the input buffer is limited, we represent the SDN switch as a single server queueing model with finite capacity
N. The packet interarrival times depend on the type of traffic, which the switch carries so that we allow for a generally distributed interarrival time distribution. The service time distribution will also be general to represent the manner in which the packets are handled in the entry buffer.
The flow table matching process involves searching the flow tables to find the entries containing corresponding action sets—i.e., flow rules that match the header of the packets of the input flows that pass through the Parser. If there is more than one flow table, the flow match process starts from the first flow table, searches all of its entries, and jumps to the next flow table. The search process continues till an entry that matches the packet header is found. Otherwise, a “packet-in” message is generated. For hardware switches whose flow tables are implemented using Ternary Content Addressable Memory (TCAM) modules, the flow match process can be performed within one clock cycle, with parallel access to all the entries of the flow table, resulting in a constant access time. Because of the limited TCAM flow table size and the power-hungry nature of TCAM in hardware switches, software-based switches are an attractive alternative. However, they are limited by a slow processing speed when flow tables are searched sequentially. A majority of previous papers model the matching time as an exponentially distributed random variable [
27,
28,
29,
61,
62,
63]. The use of a diffusion process allows us to model the flow matching process with a service time distribution that includes the flow tables’ sequential search.
Denote by p the probability that the flow rule of the arriving packet is not installed. The switch knows it after the examination of all K entries stored in the Flow Match Unit—i.e., time —where the examination of each entry requires time T. As a consequence, p is the probability that the service time is a constant , representing the case when all the flow entries in the table are examined without success, while with probability the packet’s flow match is found in a time that is uniformly distributed in —i.e., on average in time and variance . Sophisticated hardware means may also be designed to improve this performance but are not considered in this paper.
To analyze this system, we use a continuous state space and continuous time diffusion process
to replace the discrete state-space buffer queue, where the increments
are normally distributed, with mean
and variance
, which appear in the diffusion Equation (
1).
Assuming an arrival rate
and average service time
, the changes in a small time interval
tend to a normal distribution with mean
and variance
, where
and
are the variances of the interarrival and service times, and
and
are the corresponding squared coefficients of variation. Therefore, for the diffusion process we have
and
[
64].
The buffer’s size is limited to
N packets, therefore the diffusion process resides in the interval
, and we use a diffusion process with returns from the barriers at
and
to represent the jumps that occur when the buffer queue is empty and a packet arrives, and when the queue is full and a service occurs as in [
65], leading to the equations:
where
is the probability density function (pdf) of the diffusion process;
and
are, respectively, the probabilities that the process is at the barrier at
or
at time
t, corresponding to probabilities that the system is empty or saturated; and
is the Dirac delta function.
The first of the above equations defines the pdf of the diffusion process with jumps from to (arrival of the first customer after the idle period) with intensity and from to (departure of a customer ending the saturation period) with intensity . The next two equations represent the probability balance of the barriers.
2.1. Steady-State Diffusion Model with General Interarrival and Service Processes
In steady state, when
,
,
, Equation (
1) becomes an ordinary differential one and its solution, for
,
, can be expressed as [
64]:
where
, and due to normalization:
In this way,
, given by Equations (
2) and (4), approximates the steady-state distribution
in the Packet Buffer queue. A few examples of the curve
depending on
—i.e., the utilization of the system—are presented in
Figure 2. The next figure presents
, which is the loss probability due to the buffer overflow.
The steady-state queueing delay can be modelled by the time it takes the diffusion process to drift from the point
, corresponding to the queue length at the moment of the packet arrival, to
when the packet is already on the head of the queue (its distance to the transmitter is equal to zero), and is removed to be forwarded. The density of the diffusion process
given by Equation (
2) determines the queue distribution and, at the same time, the density of the initial point
at Equation (
8).
The density
of the diffusion process starting at
and ending at the absorbing barrier at the origin is given in [
66]. The method of images, usually applied to heat conduction problems, is used. We may imagine the barrier as a mirror with an image source placed at
, and the solution is a superposition of a source of unit strength, placed at the origin and a source of strength
placed at
:
The density function
of the first passage time from
to
, i.e., probability density that the process enters the barrier at time
t, is equal to the probability density that the process is leaving the diffusion interval (
):
This density should be normalized to include only the cases when the process ends at the barrier, which is certain for
. Therefore:
The first passage time of the diffusion process from the point
to the barrier at
becomes:
Suppose that a newly arrived packet joins the queue when the switch already contains
x packets. Assuming the first-in-first-out service, the packet will be forwarded out from the switch after all the packets that arrived earlier have been forwarded, so that if the queue length probability density function is
, the probability density function of the packet’s queueing delay is:
Figure 3 illustrates this result with a few curves of
for different values of the traffic intensity
, and with the parameters;
, which have been used in all the examples of
Figure 2,
Figure 3 and
Figure 4.
The mean delay experienced by a packet whose flow rule is contained in the flow table will be the sum of the queueing delay and processing time. If the mean queueing delay is
and the processing time is
, then the mean packet delay at the switch
is:
However, if a flow entry for an arriving packet is not found, then the packet is encapsulated and sent to the controller, which determines the flow rules, installs the flow entry for the packet, and sends back the packet in the packet-out message. In that case, the delay experienced by the packet,
D is the sum of the delay in the switch and the delay at the controller
[
27,
28,
29,
61,
62,
63]:
The modelling of the controller’s working mechanism to determine
is beyond the scope of this work and will be considered in future works, and in our numerical examples all the flows are known, i.e.,
.
2.2. Transient Diffusion Model with General Interarrival and Service Processes
We express the time-dependent solution
of Equation (
1) for the diffusion process having barriers with jumps with the use of the pdf
of another diffusion process which has absorbing barriers at
and
. There is no jumps, the process ends once it reaches any of the barriers. The density of such a process is known [
66]:
where:
and
,
.
If the initial condition is not given by a single point
but by a density
,
,
, then the pdf of the process with absorbing barriers has the form:
The Laplace transform of
is -4.6cm0cm
with
.
The process with jumps from barriers is a sequence of processes started by jumps at
and
, ending at the barriers, and then after a time spent there starting again due to jumps, therefore the pdf
of the diffusion processs with jumps may be represented by functions
as follows: [
51,
67]:
where
and
are the densities of jumps and are related to the pdfs
,
, the densities of sojourn times at
and
respectively, and to
and
, the probability densities that at time
t the process enters
or
, in the following way:
where:
and
,
,
,
are the densities of the first passage time between corresponding points—e.g.,:
Hence: The functions , are the pdfs of first passage time when the initial condition is given by the density .
The delay introduced by the queue length distribution with density
is
The convolutions of functions present in Equations (
14)–(
16) are more tractable when we consider the equations and solve them in the Laplace domain, receiving:
The process is in the barriers with probabilities
The original of
is obtained via the numerical inversion of the solution (
19); in the examples, below Stehfest’s algorithm [
68] was used.
Note that the above transient solution assumes constant diffusion parameters. In the case of time-dependent traffic, and therefore time-dependent diffusion parameters, we should apply it in short consecutive intervals of time where we may treat the parameters as constant.
3. Single SDN Switch Model
We use the above model to examine the behaviour of a single SDN switch. Assume the length of the packet buffer , s, , and —i.e., all connections have their entries in the Flow Match Unit; in consequence, the service time is uniformly distributed with the mean msec and the squared coefficient of variation is .
Figure 5 presents an example of the distribution of interarrival times between IP packets taken from CAIDA measurements; on this basis, we determined
for the input flow. Additionally, we considered the greater variability of interarrival times expressed by
and
to investigate its influence on the performance of the switch.
The changes in the input rate
during the considered interval of 1 s are plotted in
Figure 6 and
Figure 7.
Figure 8 displays an example of the probability density function in Equation (
14) for a chosen moment
s. when the empty queue is empty at the beginning of the interval under consideration, i.e.,
at
; it shows the impact of the interarrival time variability on the density function. It is worth to note the logarithmic scale of the plot and the possibility to determine by the model very small numerical values of the function.
Figure 6 shows the changes in
—i.e., probability of packet loss due to the buffer congestion—following the input traffic pattern, and also as a function of the variability of interarrival times, defined by the squared coefficient of variation
. Again, even very small numerical values are given by the model, and the computations are stable.
Figure 7 presents the time-dependent mean queue. When traffic intensity
is small, transient periods are short, and the changes in the mean queue follow the changes of
closely. However, for greater values of
, i.e., for higher utilization
, the transient states are visibly longer than the time between the traffic changes, and the queue is permanently in a transient state. The increase in
also increases the duration of transient states and the size of the queue.
4. Transient Analysis of Multiple SDN Switches
Consider a network of
M stations with an arbitrary topology with routing probabilities
. We follow the approach of [
69] developed for the steady-state network model then adapted to transient analysis in [
70]. Additionally, we introduce time-dependent routing to model an SDN network.
The first step to solve the network model is to decompose the network—i.e., to determine the input traffic parameters , at every station i and then apply the single server model of the previous section to each station separately.
In the transient state, we should distinguish at any station
i the input traffic
and the output traffic intensities
:
which are different;
denotes the probability that the station
i is idle at time
t, i.e., the diffusion process related to this station is inside the barrier at
. The term
presents probability that the station
i is busy and customers are leaving it with the rate
.
The traffic equations balancing the flows of stations are:
where the first term
represents the traffic flows coming from the outside of the network directly to station
i.
The routing probabilities
change each interval
following the decisions of the controler, remaining constant inside the interval, and the flow parameters may change every interval
; we assume
, in numerical examples below
. This way all model parameters are constant witin intervals
when the solution (
14) is computed.
Denote by
and
the density functions of the interarrival and service times distributions at station
j at time
t. The pdf
of the interdeparture times from this node at time
t may be expressed as:
where * denotes the convolution with respect to
x. The first term of the right side in (
22) represents the interdepature times of packets when the node
j is working, and the second term gives the interdeparture times when it is idle. The formula (
22), known as Burke’s theorem [
71], is exact for Poisson input (the pdf of the idle period distribution that should be used in the second term of (
22) is the same as
and approximate in other cases. From (
22), we receive:
where
,
, and
are time-dependent square coefficients of the variation in interdeparture, service, and interarrival times, respectively. Packets leaving the node
j according to the distribution
choose any node
i with probability
and the times between two packets routed from node
j to
i has pdf
For example, a packet leaving station
j goes to station
i with probability
or with probability
it goes elswhere but the second packet goes to
i with probability
, hence the gap has has pdf
with probability
, etc., or, after Laplace transform:
Then we compute the squared coefficient of variation:
Hence:
where the parameters
and
refer to the flows coming to station
i from outside of the network.
The parameters of the input flow at station
i are given by (
21) and (
25). Equations (
23) and (
25) form a system of linear equations yielding
and also the diffusion parameters
,
for every node
i. At each interval
, the functions
providing the queue length distributions at every station
i for
are computed. Their values at the end of the interval yield, among others, the current utilizations
used to determine the flow parameters and diffusion parameters for the next interval
.
The pdf
of the time-dependant response time (waiting time plus service) is determined using the first passage time from the end of the queue to zero, as defined by Equation (
18). If
is the response time pdf at node
i, then the response time pdf
for the path
of
n stations is:
or:
The loss probability
for same entire path may be computed from:
where
is the probability that the queue at station
i is saturated at time
t—i.e., the diffusion process for this station is at time
t at the barrier
.
5. Network Model
Consider a network composed of four SDN switches,
–
; see
Figure 9. Their parameters are the same as the switch in Example 1, except for the SDN switch
, which is twice as fast. Therefore
packets/sec while
packets/s. At all the switches, the squared coefficient of variation of service time is identical to the value
.
Similarly to Example 1, the network’s performance is investigated during 1 s. Host 1 is sending packet flows of intensity
to Host 2, and the traffic rate is changing in the range 500–2500 packets/sec, as shown in
Figure 10. Host 4 generates traffic at rate
, as shown in
Figure 10, which is forwarded to Host 2 via the SDN switches
and
.
As in Example 1, the squared coefficient of variation of interarrival times in the flows is , or or . The second input traffic at has only one parameter .
The SDN controller alters—if needed—the routing to balance the load of nodes every 100 msec; in this example, it refers to the routing probabilities
and
; see
Figure 11.
Figure 12 and
Figure 13 illustrate the decomposition of the network model. They present the flows
given by Equation (
21) and the squared coefficients of variation
received from Equations (
23) and (
25). The service times at the stations have relatively small squared coefficients of variation
. Therefore, the important variability of the first flow entering the network is reduced at the network interior, as defined by Equation (
23); see
Figure 13.
The transient solution of diffusion equations is computed in intervals of the length
msec—i.e., we have 100 intervals with fixed diffusion parameters; at the end of each
, the Equations (
21) and (
25) are solved to determine the new parameters of flow for the single-station models in the next interval. The diffusion density function obtained for any station
i at the end of an interval gives the initial conditions for the diffusion equation at the next one.
The curves in
Figure 14 compare the loss probability (note here the minimal values computed by the model), and, in
Figure 15, the mean queues for all four stations, in case of
. We may observe the changes in mean queues in
and
due to load balancing after the second flow becomes active. Observing the mean queues at
and
, we can see that the transient periods may be longer than the time between the controller’s decisions. As noted earlier, the length of the transient time increases with a load of a station and the variability of the input flow. For greater variabilities of the first flow, the path
becomes saturated; see
Figure 16. This happens due to saturation in
, as shown in
Figure 17.
Let us also consider a
simple example of optimization. Suppose, as previously, that station
is forwarding a flow
packets to nodes
and
. Station
is additionally receiving from Host 4 a flow of
packets. The controller is changing routing every
msec and needs to determine the routing probabilities for the nearest
, knowing the current parameters of flows at the beginning of the interval, as well as the current queue distributions at
,
, and
representing previous behaviour of the network. The goal is to minimize the mean backlog
at
and
during
:
We compute
,
for
and minimize
by the choice of
,
; see
Figure 18.
6. Conclusions
The advent of SDN allows the implementation of smart adaptive routing, which changes network paths so that new connections may be established and inactive may be removed, as well as to deal with changes in traffic loads and incidents that affect network security. This leads to an interesting paradigm shift in network modelling, which has traditionally addressed “long term” behaviours and computational methods which are appropriate for steady-state analysis. However, when SDN intervenes dynamically to change paths and traffic levels, the network is seldom at a steady state, and optimization must take transients into account.
Therefore in this paper we have used diffusion approximation modelling for the performance evaluation of a network of SDN switches, that considers both steady-state and transient analysis. We have shown how changes in routing or forwarding decisions by the SDN controller can influence performance parameters such as delay, queue size, and packet loss probability in the transient state. Our results indicate that this method is computationally operational and can provide useful quantitative results for models with realistic parameter values.
Our analysis captures the interactions among the main parameters of the network, and numerical examples display the dependence of the queue lengths, queueing delays and their dynamics as a function of the changing flow intensity and variance of interarrival times. Our approach also confirms that transient periods play a significant role in the performance of SDN networks, and that they will be useful to analyze much larger networks in future work.
While the performance evaluations performed in this paper are purely numerical, and based on diffusion approximations models that have been widely validated by simulations [
47,
67,
72], in future work we intend to use network emulation tools such as Mininet with real traffic, as well as experiments on a SDN test-bed, to study the influence of time dependent forwarding decisions on the main performance metrics of large SDN networks.