2. Bitwise Exclusive-or Sum of Random Variables
The mathematical environment for the method and theorem proposed in this paper is the probability space and operations on its elements. The probability space is a measurable space that is defined by a triple
, where [
12,
13,
14]:
- (1)
is any non-empty set called the sample space.
- (2)
is a
-algebra or
-field,
i.e., a collection of subsets of
that satisfies four postulates:
,
for any subset , if , then ,
for any countable collection of subsets , , if , then , and,
for any countable collection of subsets , , if , then .
- (3)
is a countably additive measure, i.e., a mapping from to the closed interval such that and .
The sample space
is the set of all possible random outcomes of some experiment. A random variable assigns a numerical value to each of those outcomes [
15]:
Definition 1. Given a probability space
, a random variable is a function
X from
to the real numbers
R such that
Formula (1) states that the function
X must be measurable. For a continuous probability space, the sample space
is uncountable, and the random variable
X can assign an uncountable number of values. For the discrete probability space, the sample space
is finite or countable [
12,
13,
14,
15].
Definition 2. We say that X is a discrete random variable if the function X assumes only a finite or countable number of values.
In this paper, we assume that a random source
produces the sequences of symbols
,
. The symbols are represented unequivocally by
l-bit integer words 0, 1, …,
p − 1 from the finite set
, where
. The process of emitting symbols can be modeled as a stochastic process
where
are discrete random variables with values from the closed interval
.
Definition 3. The entropy of a stochastic process
is defined by [
16]
when the limit exists.
Formula (2) determines the average rate of the asymptotic growth of the entropy
with increasing
k if the limit exists [
16]. The entropy
is also termed the entropy rate. The value of
is in bits per time-step (bit/time-step). If
is a sequence of independent and identically distributed (
i.i.d.) random variables, then [
16]
If
is a sequence of independent but not identically distributed random variables, then [
16]
Following Thomas and Cover, it worth noting that we can choose sequences of distributions on
such that the limit
does not exist [
16].
The greatest value of is equal to l bits per time-step. This value is obtained when is a sequence of independent and uniformly distributed random variables. Thus, the key point for the greatest entropy is the independence and uniformity of the distributions on .
Because the values of are encoded by l bits, the random variable can be written unequivocally as a sequence of l binary random variables , , i.e., .
Let
be another random source that produces sequences of symbols
,
. As previously noted, the symbols are represented unequivocally by
l-bit integer words 0, 1, …,
p − 1 from the finite set
, where
, and the process of emitting symbols can be modeled as a stochastic process
where
are random variables with values from the closed interval
. Each random variable
can be written unequivocally as a sequence of
l binary random variables
,
,
i.e.,
.
Definition 4. A bitwise exclusive-or sum
of independent random variables
and
is an operation on sequences
and
of equal length of binary random variables that combines
and
,
using the exclusive-or operation,
i.e.,
If
and
are binary random variables, then
is also a binary random variable. Consequently,
are random variables with values in the closed interval
. The probability that
assumes a given value
from
is equal to
. Similarly, the probability that
assumes a given value
from
is equal to
. For the random variable
,
, where
.
3. Increasing the Entropy of a Sequence of Discrete Random Variables with an Auxiliary Table
As noted in the previous section, the process of emitting symbols from can be modeled as a stochastic process , . The maximal entropy of is obtained when are independent and uniformly distributed random variables. In this section, we assume that are independent but not uniformly distributed. They can also have different distributions for different k. We search for a computationally efficient algorithm for processing a sequence of symbols emitted by that provides at the output a sequence that can be modeled as a sequence of independent and uniformly distributed random variables. The proposed algorithm uses an auxiliary table T with L cells and has the following form:
Algorithm XOR-A
Initialization:
Fill all of the cells of T with the L subsequent l-bit words produced by ;
Computations:
end;
The subsequent l-bit words are written cyclically into table T and addressed from 0 to . During each step, we choose R + 1 l-bit words , where is the current and are read from T. The addresses of depend on the random number being produced by an auxiliary random source with a uniform distribution. The elements of the R + 1 vectors are next summed modulo 2, which forms a single output with values from . To avoid multiple summing of the same number, the size of T satisfies the condition , where R is an integer. For the same reason, we use instead of . Producing an N-element sequence requires the emission of (N + L) l-bit words .
Algorithm XOR-A uses the property that a sequence formed from a sequence of independent random numbers is also random. This property has been used by MacLaren and Marsaglia [
17] to permute sequences from the linear congruential pseudorandom generator with an auxiliary and simpler, pseudorandom number generator. They noticed that the shuffled sequence might have better statistical properties and a longer period. This method is known in the literature as Algorithm M [
18]. Another heuristic approach used in algorithm XOR-A is the observation that combining XOR random words can also provide sequences with better statistical properties, in particular with higher entropy rates [
19]. No exact theory explaining why combined generators can significantly improve the properties of sequences has been published yet. We know only the heuristic arguments. For example, Deng and George, and later Deng
et al., provided arguments based on probability densities [
20,
21]. Some heuristic arguments come from Marsaglia [
22]. One of the intuitive arguments for why combining improves the statistical properties of sequences and why it is better than shuffling is that a combined generator can produce new numbers, whereas shuffling only permutes existing elements.
An important factor in combined generators is the choice of mathematical operation that combines independent digital words. Because combining should not significantly increase the computation effort, only simple operations are considered: addition, addition modulo
p, where
p is an integer, and bitwise exclusive-or. The latter has a mixing property, commonly used, e.g., in ciphers. Examples include the decryption-encryption standard, the advanced encryption standard or the secure hash algorithms SHA-1 or SHA-2,
etc., [
11,
23,
24]. The XOR operation applied to sequences of bits produced by independent source generators also reduces the correlation between adjacent bits [
3,
6]. To use this property, we can divide a sequence into, e.g.,
l-bit disjoint blocks and compute the XOR function for each block. As the result, a sequence with a lower correlation between adjacent bits is obtained, but this method reduces the bit rate
l times. Combining source streams according to Algorithm XOR-A decreases the output speed only slightly because of the fast operations used (this problem will be discussed at the end of
Section 3). The distance between the elements that form the
i-th,
, source stream changes randomly because
is random. Let us emphasize, that this distance cannot be constant because the obtained source sequences are shifted versions of the same sequence. It cannot also be deterministic because elements of the source streams may repeat periodically, producing periodic patterns in the output sequence. This limits the use of algorithm XOR-A to sequences with a length not greater than the period of the auxiliary generator when the source of numbers
is deterministic. However, if
is produced by an entropy source, then this limitation does not exist.
Algorithm XOR-A provides a sequence of l-bit words. Because a bitwise exclusive-or sum of random variables is also a random variable, each is a value of a certain discrete random variable , . A set of l-bit words is formed randomly and independently of the previous sets. Consequently, and are also independent for any . The open problem is the distribution of variables when can have arbitrary, the same or different distributions.
Theorem 1.
If
- (1)
Random variables , are independent,
- (2)
An auxiliary source provides the random numbers with a uniform distribution
- (3)
,
- (4)
,
then the random variable
has a uniform distribution in the interval
for
.
Proof. Independently of
k and the value of
, the random variable
,
is a bitwise exclusive-or sum of
independent random variables with theoretically arbitrary, the same or different distributions. Consequently, proving the distribution of
,
can be reduced to proving the distribution of the random variable
where
are independent with arbitrary, the same or different distributions. Because the values of
,
are encoded by
l bits, the random variable
can also be written unequivocally as a sequence of
l binary random variables
,
,
i.e.,
. Similarly, the values of
Z are also encoded by
l bits, and Z can be written unequivocally as a sequence of
l binary random variables
,
,
i.e.,
. It is also that
To prove that
Z has a uniform distribution, we first show that
for
, where
P denotes the probability and
. The key to this proof is a calculation trick and the methodology introduced by R. B. Davies in a private paper [
3]. The link to this page can also be found at the National Institute of Standards and Technology (NIST) webpage [
25]. If
takes the values of 0 and 1, then
takes the values of 1 and −1. Consequently [
3],
If
, then
. Assuming that
is the expected value (mean) of the random variable
, we obtain the following:
If
are independent, then
Because
we obtain
or
where
Considering the sign of
,
Based on the assumption, the random variables can assume any value from the interval
with a nonzero probability, and no value from
is “certain,”
i.e., it cannot be assumed with a probability equal to unity. Consequently, any subsequence of an
l-bit sequence also appears with a nonzero and smaller than unity probability, and the variables
,
,
assume both zero and one for each
i. In this case, the expected value
is always nonzero and is smaller than unity,
i.e.,
Because the product of the numbers with values in the interval
decreases to zero as the quantity of multiplied numbers increases, we obtain
Variable
takes only two values, zero or one. It leads to the following equality
Because the random variables , assume all values from with a nonzero and smaller than unity probability and for all , the variable has a uniform distribution in for R approaching infinity. This completes the proof.
Let us emphasize that the assumption that , .. assumes values from with a nonzero and smaller than unity probability is necessary to draw the conclusion that equal probabilities of assuming zeros and ones by results in a uniform distribution of . This assumption eliminates all of the specific situations, i.e., combining variables with fixed (probability equal unity) values or the lack of some numbers (zero probability). As an example, let us consider the case of . If assumes, e.g., only two values, 3 and 0, with a probability of 1/2 (the numbers 1 and 2 are assumed to have zero probability), then the result of the combination is the numbers 3 or 0, independent of R. The number 3 is obtained if we combine an odd number of number 3s, and the number 0 is obtained for an even number of number 3s, although and .
Words
are chosen from the words that were already produced by the random source
. Theorem 1 shows that perfect uniformity (the entropy equal to
l bits/time-step) can only be obtained in the limit
,
i.e., for table
T with
cells. In a practical system,
L is finite, and consequently, the entropy can only be close to the theoretical limit. Assuming a fixed
R, which results from the assumed acceptable value of the entropy
of sequence
, the smallest
L is equal to
(
). A small
m enables some of the words
chosen in step
n to repeat for the next several iterations with a large probability. A greater value for
m reduces this probability but increases the size of table
T. A smallest acceptable
m should ensure that the smallest distance,
i.e.,
, repeats statistically no more frequently than every
iterations when we use
as
. This condition can be written as
where
P is the probability. Because
has a uniform distribution, Condition (21) takes the following form:
or
Because
m must be an integer, inequality (23) takes the form
where
is the smallest integer number that is greater than
,
. A given distance
repeats statistically every
iterations for
or rarely, when
.
The basic weakness of algorithm XOR-A is the necessity of using an additional source of randomness that provides random numbers
with a uniform distribution. Because the proof of the proposed theorem indicates that such numbers are available at the output of algorithm XOR-A, they can be used to produce the numbers
. We exploit here the fact that any subsequence of digits of a random number can be used to form another random number. Therefore, if
is random, then any sequence formed from bits of
is also random [
19]. Assuming that
where the truncation operation leaves the
m higher-order bits of number
, the algorithm XOR-A can be modified in the following way:
Algorithm XOR-B
Initialization:
Choose the size L of an auxiliary Table T;
Fill all of the cells of
T with the
L subsequent
l-bit words produced by
;
Computations:
end;
A limitation for algorithm XOR-B is the number
R of words
that are read from
T. Parameter
R must satisfy the condition
One of the factors determining the utility of a proposed algorithm is its computational complexity. The framework for the complexity of computations is contained in a classical paper by M. O. Rabin [
26]. In algorithm XOR-B the following operations on
l-bit words can be extracted during the production a single
l-bit word
:
—division of two l-bit numbers with a rest,
—multiplication of two l-bit numbers,
—addition of two l-bit numbers,
where operation (i) can be reduced to multiplication by an inverse number. Assuming that the elementary operation is addition, the complexity of computations of (i) and (ii) can be equal to
,
or
, where
O is one of the family of Bachmann-Landau notations known as “Big O” [
27,
28]. The concrete value depends on the assumed multiplication algorithm. The complexity of the computation of (iii) is
or
. As previously, the details depend on the addition algorithm assumed. Algorithm XOR-B also uses logic operation XOR on
l-bit words. The computational complexity of this operation can be reduced to
l multiplications of one-bit numbers with addition in the Galois field GF(2). The complexity of this operation is
. The dominating operation in algorithm XOR-B is:
The assignment from Equation (27) can be factored into
R operations XOR of the form
The complexity of Equation (28) is equal to:
Omitting a constant
C, Equation (29) is reduced to
Because the dominating operation is multiplication, the computational complexity of algorithm XOR-B can be assessed as for a simple multiplication of l-bit numbers, for the multiplication of l-numbers with Karatsuba’s algorithm and for the Schönhage–Strassen algorithm. When algorithm XOR-B is implemented in a hardware supporting multiplication of l-bit numbers, the time of multiplication can be the same as for the elementary operation. Consequently, the complexity of Equation (28) can be reduced to . It is the same as the complexity of hash algorithms with the same word length ( for SHA-1 and SHA-256/224 and for SHA-512/384).
The total computational effort depends on both
l and parameter
R. Parameter
R does not depend on
l and is assumed to be fixed value for algorithm XOR-B. Computational effort determines a total time of computations and is usually measured in clock cycles necessary to execute an algorithm. In our case, it is production of a single number
. It requires:
R operations XOR defined by Equation (28), one load of
l-bit word, one computation of
, one truncation, and one substitution
. The detailed comparison of XOR-B with SHA-1 and SHA-2 is reliable when the same software and hardware platform is used for all algorithms. We limit our considerations to general comment, because it is not a subject of this paper. For very large
R, greater than 80 rounds for SHA-1, SHA-512/384 and 64 rounds for SHA-256/224, the algorithm XOR-B can theoretically be slower. However, the high value of
R indicates that some numbers
occur with very small probability. There is no proof that SHA-1 and SHA-2 always provide uniform distribution at the output for biased raw sequences [
29].
Example
Theorem 1 indicates that the simplest method for increasing the entropy of a sequence is to group its elements into blocks with elements and compute the bitwise exclusive-or sum of all of the elements for each block. This method is numerically inefficient because to obtain an N-element sequence with a greater entropy, a random source must produce symbols . Using an auxiliary table and the symbols already produced by source , the production of the N-element sequence with theoretically the same entropy requires emitting symbols, where , . For example, if and , then the source must produce 11,000 symbols for the first method and 1160 symbols when algorithm XOR-B is used.
To compare different methods and assess the uniformity of the distribution of numbers obtained using the proposed theorem, a pseudorandom number generator with a Gaussian distribution of generated numbers was used. The Gaussian distribution occurs in many physical phenomena and plays a key role in the Shannon model of communication. As a source of independent numbers, we used a pseudorandom number generator from the
Mathematica package [
30]. The instruction
provides numbers with a Gaussian distribution with an average
and standard deviation
. Because the obtained numbers are both positive and negative, we first compute their absolute values. Next, we consider numbers from the interval encoded by
bits, where
is the number of bits that encode the integer part and
is the number of bits that encode the fractional part. If a generated number does not belong to the interval
(generated numbers can, theoretically, assume values from minus to plus infinity), then we repeat the generation to obtain a number from this interval. The output is the bitwise exclusive-or sum of the
numbers
, for which the values belong to
. The distance between the addresses of
changes randomly for each iteration. This additional randomness comes from the higher-order bits of the output number
, excluding the first iteration, when it is fixed and equal to unity. In the experiment,
is fixed and equal to 16 bits. The value of
is changed to obey the three exemplary ranges of the Gaussian distribution:
,
, and
. Consequently,
for the range
, and
for the ranges
and
. It was also assumed that
and
. The size of table
L is equal to
, where
.
To assess the uniformity of the obtained distributions, a chi-square test was used. This test measures the agreement between the empirical distribution and the theoretical distribution for a given degree of freedom (number of categories) and significance level. The value of the statistic
is computed using the formula [
18,
19]
where
H is the number of categories,
Ni is the number of samples from the
i-th category,
Pi is the probability that each sample falls into category
i, and
N is the total number of samples. The critical value
of the chi-square test for 50 categories and the significance level
, chosen in the numerical experiment, is equal to 74.92 (we do not assess the parameters of the distribution). The results of the tests for
, ranges
,
,
, and the two methods of producing uniformly distributed random numbers are summarized in
Table 1,
Table 2 and
Table 3.
The results shown in
Table 1 indicate that we require only four numbers from the interval
to form uniformly distributed numbers from the numbers with Gaussian distributions with
and
for both methods. When the probability of producing certain numbers from
decreases, the parameter
R increases.
Table 1.
The values of the statistic for the range .
Table 1.
The values of the statistic for the range .
Grouping Subsequent Elements into Element Blocks |
---|
R | 1 | 2 | 3 | 4 | 5 | 6 |
| 226.41 | 76.42 | 55.28 | 39.59 | 53.14 | 51.69 |
Algorithm XOR-B |
R | 1 (m = 0, L = 1) | 2 (m = 1, L = 4) | 3 (m = 2, L = 12) | 4 (m = 2, L = 16) | 5 (m = 3, L = 40) | 6 (m = 3, L = 48) |
| 219.81 | 97.13 | 37.19 | 37.52 | 45.13 | 34.85 |
Table 2.
The values of the statistic for the range .
Table 2.
The values of the statistic for the range .
Grouping Subsequent Elements into Element Blocks |
---|
R | 4 | 5 | 6 | 7 | 8 | 9 |
| 109.98 | 85.61 | 65.79 | 21.93 | 33.99 | 48.16 |
Algorithm XOR-B |
R | 4 (m = 2, L = 16) | 5 (m = 3, L = 49) | 6 (m = 3, L = 48) | 7 (m = 3, L = 56) | 8 (m = 3, L = 128) | 9 (m = 4, L = 144) |
| 173.56 | 86.24 | 54.96 | 41.76 | 38.86 | 51.23 |
Table 3.
The values of the statistic for the range .
Table 3.
The values of the statistic for the range .
Grouping Subsequent Elements into Element Blocks |
---|
R | 5 | 10 | 15 | 20 | 25 | 30 |
| 4400.60 | 383.53 | 80.57 | 50.69 | 51.81 | 48.97 |
Algorithm XOR-B |
R | 5 (m = 3, L = 40) | 10 (m = 4, L = 160) | 15 (m = 4, L = 240) | 20 (m = 5, L = 640) | 25 (m = 5, L = 800) | 30 (m = 5, L = 960) |
| 3589.55 | 362.47 | 75.39 | 64.20 | 50.29 | 53.58 |
Considering the numbers from the interval , we must sum the bitwise exclusive-or for at least seven numbers to pass the chi-square test. Exploiting the numbers from the tail of the Gaussian distribution, we need 21 or more numbers from the interval for both methods. The use of algorithm XOR-B requires significantly fewer numbers to produce an N element output sequence than grouping subsequent into element blocks.
4. Conclusions
In this paper, a novel method for increasing the entropy of a sequence of discrete random variables has been proposed. The elements of this sequence are a bitwise exclusive-or sum of independent random variables that model the output of a random source. The proposed algorithms use an auxiliary table and a novel theorem proved in this paper. The theorem and the algorithms provide an efficient method for converting a sequence of independent random variables with a practically arbitrary distribution into a sequence of independent and uniformly distributed random variables, i.e., with entropy close to maximal. The algorithm XOR-B does not need an additional circuit and requires only input words per one output word. Grouping the subsequent words into element blocks requires input words per one output symbol.
The areas for applications include dynamical systems, especially chaotic systems as a source of random numbers, chaos-based cryptography, traditional cryptography, statistics, simulation experiments, and many others fields in which random sequences with entropy close to the maximal value are needed.