In this section, an algorithm is proposed to extend the bit independence criterion (BIC) to stream ciphers, experimentally confirming its effectiveness. The two main differences that arise in this scenario with respect to its application in S-boxes are discussed.
The following sections describe each of these steps and end with the proposal of an algorithm to evaluate the BIC in stream ciphers.
3.2. Test of Independence between Two Avalanche Variables and
Second difference. In [
13], Pearson’s correlation coefficient
was used to measure the degree of independence between the pairs of avalanche variables. The use of such a coefficient in [
13,
29] has two main disadvantages: the first one is that it only detects linear correlations, and the second one is that the critical region for the rejection of the null hypothesis is not explicitly defined, i.e., a threshold is not defined below which
is decided. Thus, it can be a reason for an imprecision in the decision when dealing with small coefficient values. In order to solve the first aforementioned disadvantage, mutual information can be applied to measure the degree of independence between pairs of avalanche variables [
44], but in this case, it is important to determine which estimator to use, since there are no estimators of unbiased entropy of minimal variance; the second disadvantage can be solved by defining the critical region using a transformation of the correlation coefficient of the type
, where
t is distributed as a
t-Student distribution with
degrees of freedom [
45].
Another approach is that when
and
are independent, then
is balanced [
46]. In this work, independence will be evaluated by measuring the adjustment
to the binomial distribution
, where
is the Hamming weight. This allows setting a threshold for the decision criterion on independence between
and
.
Since
is a binary matrix, the adjustment to the binomial distribution will be measured by the
-test with 1 degree of freedom, with the test hypothesis given by:
The test statistic used is
As usual [
2], the value
is such that
If the null hypothesis is rejected. It is left for future works, to compare the effectiveness of these three criteria for evaluating independence between the avalanche variables.
3.3. BIC Acceptance Test
To decide whether the stream cipher
f satisfies the BIC, it is necessary to take into account the number of rejections of
on the
n matrices; for this, a random variable
T, which counts the total number of rejections on
n matrices is defined:
where
and
The variable counts the number of rejections of the null hypothesis in the matrix .
Expected number of rejections of . In each of the n SAC
matrices,
pairs of columns are formed, thus the number of rejections
T satisfies
When , we have the ideal case for compliance with the BIC, since all the pairs of columns are independent, while as , the number of non-independent column pairs increases.
Under the hypothesis test above, with a significance level
, the expected number of rejections of
is:
for each matrix
. In total, among the
n matrices SAC are expected
rejections.
The random variable
follows a binomial distribution
. Taking into account that generally
, this distribution can be approximated, in this case, to the Poisson distribution with parameter
. Since
is large, due to large values of
, then the Poisson distribution can be approximated by the Normal distribution with mean and variance:
Decision criteria. To compare the value with the distribution, a significance level is selected. Then, it is tested if f does not satisfy the BIC, with a significance level , if . It can be seen that if , then the values of decreases with respect to and is not satisfied, so the BIC is fulfilled. On the other hand, if , then the values of will be greater as T increases, so is satisfied and the BIC compliance is rejected.
Normality of the test statistic T. In the expression of T there are Bernoulli variables , whose distributions under and are different:
Under , all variables are independent, identically distributed and take the value of 1 with probability , so T follows exactly a binomial distribution . Although generally the binomial distribution can be approximated by the normal distribution, with mean and variance , taking into account that grows very quickly with m.
Under
, the variables
that appear in the expression of
T are not identically distributed, since the rejection of the BIC means that there are several matrices
for which the hypothesis
of independence between
and
is rejected. In this case,
and may be different when
varies. For this reason, a binomial does not appear directly as the distribution of
T. However, it is still possible to approximate the distribution of
T by the Normal distribution. For this it is sufficient to calculate the mean
between the probabilities of all the variables
and the distribution of
T can be approximated by the binomial distribution
. This distribution, in turn, can be approximated by the Normal distribution, taking into account high values of
. The precision of this approximation depends on the difference between the probabilities
involved in
, therefore the variance value between these probabilities can be a measure of the quality of the approximation.
When comparing the distribution of T under and , similarities and differences are observed. They are similar in that in both cases T follows a Normal distribution, but there are two differences, the first and most important is observed between the expected values of both distributions (it will be higher under ) and the second refers to the level of adjustment to this distribution (may be lower under ). In the rest of this work, the proposed method to evaluate the BIC in stream ciphers will be called the BIC test.
3.4. BIC Test Algorithm
Given a set
of
l randomly chosen
n bits inputs to the function
f, constructs for each binary vector
its associated SAC matrix
and for all for
with
, it is checked if
follow the
distribution, see the proposed Algorithm 3.
Algorithm 3 BIC stream ciphers algorithm |
- Input:
f function to evaluate, n size of the inputs of f, m size of the outputs of f, and levels of significance, D set of l inputs to the function f. - Output:
If f satisfies the BIC - 1:
- 2:
fordo - 3:
for do ▹ Matrix Construction - 4:
Compute - 5:
end for - 6:
for each do ▹ Independence check between and - 7:
if then - 8:
▹ Independence is rejected between and - 9:
end if - 10:
end for - 11:
end for - 12:
ifthenf does not satisfy the BIC - 13:
elsef satisfies the BIC - 14:
end if
|
3.4.1. Complexity of the Algorithm
In steps 3–5 of the algorithm, f is used to generate m output bits. Assuming that the stream cipher f generates each output with a constant cost, then operations are performed in these steps, since l times m output bits are generated from f. In steps 6–10 of the algorithm, operations are performed due to the computation times the Hamming weight in a sequence of l bits.
Thus the algorithm performs operations, and the number of algorithm operations depends on the number n of input bits, the number m of output bits, and the number l of inputs used. It can be seen that the increase in the parameter m has a greater influence than n and l in increasing the number of operations of the algorithm. In the particular case , operations are performed.
3.4.2. Parameter Selection
As seen in the previous section, the number of operations of the BIC algorithm depends on three parameters, the number l of inputs, the number n of bits of each input, and the number m of bits of each output.
Selection of l such that and fit to the binomial distribution . The number
l of entries influences the effectiveness of the
-test in determining whether two columns are independent. Increasing
l guarantees a greater fit of
to the binomial distribution
; however, it causes an increase in the number of operations. In practice, the idea is to obtain a cost-effectiveness ratio using a value of
l such that it maintains the fit and provides a practical number of operations. Using the confidence interval for proportions [
47], it is possible to obtain a value of
, such that prefixing
achieves a good fit. This confidence interval is given by
Solving for
l we get to
where
, is the deviation of
over
p, and
.
Example 1. Calculation of the lower boundforl. A value from which, with high probability, it is satisfied that is needed. Then, substituting for a significance level and a deviation e whose absolute value satisfy inequality , we get In this way, for the significance level and the deviation e selected, it is concluded that l must be chosen such that .
Example 2. Convergence ofand deviation. Table 2 shows the behavior of the deviation observed for several l, , with and . It can be seen how, for most of the estimated e, the imposed condition is met . Selection of under the null hypothesis . The number n of inputs and the number m of outputs influence the sample size for the calculation of the number T of rejections of . In general, we will have pairs of columns to check and it is expected, with probability , that pairs of columns will be rejected.
Let be some default value of from which the distribution of T can be approximated to . It is necessary to select n and m such that is satisfied and a value of such that is obtained. It is advisable to select a high value of that avoids the use of corrections and provides a good fit.
It is known that increasing
provides better precision in the Poisson approximation to the Normal distribution. To obtain
, we can use the confidence interval for proportions [
47], this time in an approximation to the Normal distribution with one tail. So, we have
Example 3. Calculation of the lower boundford. Substituting, , , with a significance level and a deviation of , we obtain Then, , therefore, for the values of and e chosen, it is enough to select values of n and m such that . In Table 3, for , some values of n and m are highlighted in italics from which . To select n, m and l, the trade-off between reducing computational cost and maximizing effectiveness can be taken into account. However, it is very important to be careful when selecting which values to use, since minimizing computational cost could limit the effectiveness of the BIC method and overestimate the quality of the stream cipher. It is advised to prioritize increasing effectiveness.