1. Introduction
Cryptography and security applications make extensive use of random bits, such as keys and initialization vectors in encryption and nonces in security protocols. The generation of random bits should be designed with the security goal of
indistinguishability from an ideal randomness source, which generates identically distributed and independent uniform random bits with full entropy (i.e., one bit of entropy per bit). However, this security goal is challenging to achieve. The list of real-world failures, where a random bit generator (RBG) is broken and the security of the reliant application crumbles with it, continues to grow [
1,
2,
3,
4]. There are two fundamentally different strategies for designing RBGs. One strategy is to produce bits non-deterministically, where every bit of output is based on a physical process that is unpredictable; the other strategy is to compute bits deterministically using an algorithm from an initial value that contains sufficient entropy to provide an assurance of randomness. This class of RBG is referred to as pseudo-random bit generators (PRBGs) or deterministic random bit generators (DRBGs) [
5]. Due to their deterministic nature, PRBGs produce sequences of pseudo-random (rather than random) bits. Real-world PRBGs usually employ cryptographic primitives, such as stream ciphers, block ciphers, hash functions and elliptic curves, as their basic building blocks. For instance, the revised NIST SP 800-90A standard [
5] recommends Hash-DRBG, HMAC-DRBG and CTR-DRBG based on approved hash functions and block ciphers. It is worth mentioning that, as pointed out in [
2], the NIST SP 800-90A standard is not free from controversy: the disgraced algorithm DualEC-DRBG in the standard was reported to contain a back door [
3]; the recommended DRBGs, when supporting a variety of optional inputs and parameters, do not fit cleanly into the usual security models of PRBGs; there is no formal competition in the standarization process; and there is a limited amount of formal analysis of the recommended DRBGs.Although not included in the NIST standard, PRBGs that use stream ciphers based on feedback shift registers (FSRs) as building blocks have several advantages, including the simple structure of operations of addition and multiplication, fast and easy hardware implementations in almost all computing devices, and good statistic characteristics (long period, balance, run properties, etc.) in output sequences [
6,
7]. In addition, the output sequence from other PRBGs will ultimately become periodic and therefore can be produced by certain FSRs. This provides another perspective for investigating the security strength of output sequences from PRBGs.
Consider a pseudo-random sequence
generated from a PRBG. A natural question arises: how should one evaluate the
randomness of the sequence
? Research on this problem can be traced back to the early 20th century. As early as 1919, Mises [
8] initiated the notion of random sequences based on his frequency theory of probability. Later, Church [
9] pointed out the vagueness of the second condition on randomness in [
8] and proposed a less restricted but more precise definition of random sequences, as follows: An infinite binary sequence
is a random sequence if it satisfies the following two conditions: (1) assume
denotes the number of 1s among the first
r terms of
, then the sequence
for
approaches a limit
p as
r approaches infinity; (2) for a sequence
with
and any effectively calculable function
, if the integers
n such that
form an infinite sequence
, then the sequence
for
approaches the same limit
p as
r approaches infinity. Despite the emphasis of calculability of
in Condition (2), one can see that it is hardly (if not at all) possible to test the randomness of a sequence by this definition. It could be used in randomness test by negative outcomes, namely, a binary sequence
is not random if one can find certain effectively calculable function
, for which Condition (2) cannot hold. In order to justify a proposed definition of randomness, one has to show that the sequences, which are random in the stated sense, possess various properties of stochasticity with which we are acquainted in probability theory. Inspired by the properties obeyed by the maximum-length sequences, Golomb proposed the
randomness postulates of binary sequences [
10]: balancedness, run property (a subsequence of consecutive 1 s/0 s in a sequence is termed a run. A maximum-length sequence of length
contains one run of
m 1 s, one run of
0 s and
runs of
1 s and 0 s for
) and ideal autocorrelation. Kolmogorov [
11] proposed the notion of
complexity to test the randomness of a sequence
, which is defined as
, the length of the shortest
that can produce
by a universal Turing machine program
A. This notion is later referred to as the Kolmogorov complexity in the literature. Along this line, further developments were made in [
12,
13] and summarized in Knuth’s famous book series
The Art of Computer Programming [
14].
Rather than considering an abstract Turing machine program that generates a given sequence, the model of using FSRs to generate a given sequence has attracted considerable attention. This model has two major advantages: firstly, all sequences that are generated by a finite-state machine in a deterministic manner are ultimately periodic, and as such, can be produced by certain finite-state shift registers; secondly, it is comparatively easier and more efficient to identify the shortest FSRs producing a given sequence, either of a finite length or of infinite length with a certain period. Due to its tractability, the
linear complexity of a specific sequence
, which uses linear FSRs (LFSRs) as the algorithm
A in the Kolmogorov complexity, is particularly appealing and has been intensively studied. The linear complexity of a given sequence
can be efficiently calculated by the Berlekamp–Massey algorithm [
15,
16], and the stochastic behavior of linear complexity for finite-length sequences and periodic sequences can be considered to be fairly well understood [
17,
18,
19]. The good understanding of linear complexity of sequences is an important factor for its adoption in the NIST randomness test suite [
20]. Note that extremely long sequences with large linear complexity can usually be generated by much short FSRs with certain nonlinear feedback function. Consequently, more figures of merit to judge the randomness of sequences, such as maximum-order complexity [
21], 2-adic complexity [
22], quadratic complexity [
23], expansion complexity [
24] and their variants, have been also explored in the literature.
This paper will survey the development of complexity measures used to assess the randomness of sequences. It is important to note that this paper does not intend to offer a comprehensive survey of this broad topic. Instead, it aims to provide a preliminary overview of the topic with technical discussions to certain extent, focusing mainly on complexities within the domain of FSR that align with the author’s research interests. To this end, I delved into significant findings on the study of FSR-based complexity measures, particularly on the algorithmic and algebraic methods in computation, statistical behavior, theoretical constructions and the relations of those complexity measures. Research papers on this topic presented at flagship conferences in cryptography, such as ASIACRYPT, CRYPTO and EUROCRYPT, published at prestigious journals with IEEE, Springer, Science Direct, etc, as well as some well-recognized books since the 1970s constitute the major content of this survey. Readers may refer to relevant surveys on this topic with different focuses by Meidl and Niederreiter [
19,
25] and Topuzoǧlus and Winterhof [
26].
The remainder of this paper is organized as follows:
Section 2 recalls the basics for feedback shift registers, which lays the foundation for the discussions in subsequent sections.
Section 3 reviews important developments of the theory of linear complexity profile for random sequences,
Section 4 discusses a mathematical tool used to efficiently calculate the quadratic complexity of sequences and
Section 5 surveys computational methods, the statistical behavior for maximum-order complexity and theoretical constructions of periodic sequences with the largest maximum-order complexity. In
Section 6, known relations among complexity measures of sequences are briefly summarized. Finally, conclusive remarks and discussion are given in
Section 7.
2. Feedback Shift Registers
Let
be the finite field of
q elements, where
q is an arbitrary prime power. For a positive integer
m, an
m-stage feedback shift register (FSR) is a clock-controlled circuit consisting of
m consecutive storage units and a feedback function
f as displayed in
Figure 1.
Starting with an initial state
over
, the states in the FSR will be updated by a clock-controlled transformation as follows:
where
and the leftmost symbol for each state will be output. In this way an FSR produces a sequence
based on each initial state
and its feedback function
f. The output sequence from an FSR, known as an FSR sequence, can be equivalently expressed as a sequence of states
, with the relation
for
. When
for the least integer
, we obtain a cycle of states
, or equivalently a sequence
of least period
p. In his influential book [
10], Golomb intensively studied the properties of FSR sequences from both linear feedback functions and nonlinear feedback functions. Readers can refer to [
10] for a comprehensive understanding of FSR sequences.
For an
m-stage FSR with a feedback function
f and an initial state
, if we know
n consecutive
in the output
, then we obtain
equations
When the feedback function
f is linear, namely,
, and
, one can uniquely determine its
m unknown coefficients
from the above
linear equations. When the feedback function
f is nonlinear, the number of terms in
f increases significantly. For instance, for
a binary feedback function
f of degree
r has
possible terms. The above equations in Equation (
2) become more difficult to analyze. The equations are not necessarily linearly independent and a lot more variables are involved. Consequently, more observations and techniques are required in the analysis of these equations. In the following, we will review some results on FSR sequences when the feedback function is linear, quadratic and arbitrarily nonlinear, as well as their relations with other complexity measures.
We will consider both finite-length sequences and infinite-length sequences with certain finite period. Throughout what follows, we will use to denote a generic sequence with terms from certain alphabet, to denote a sequence with length n that is not a repetition of any shorter sequence, to denote a sequence of k repetition of and to denote infinite repetitions of , indicating that has least period n.
3. Linear Complexity
Given an FSR, when its feedback function
f is a linear function on the input state, namely, it is associated with the following linear recurrence:
where
are taken from
, it is termed linear FSR and the output sequence
is called an LFSR sequence of order
m. The polynomial
with
, is called the
feedback or characteristic polynomial of
. The zero sequence
is viewed as an LFSR sequence of order 0.
Definition 1. Let be an arbitrary sequence of elements of and let n be a non-negative integer. Then the linear complexity is defined as the length of the shortest LFSR that can generate . The sequence is called the linear complexity profile of the sequence .
It is clear that and for any integer n and sequence . Hence, the linear complexity profile of is a nondecreasing sequence of non-negative integers. Two extreme cases for correspond to highly non-random sequences whose first n elements are either or with , which has or , respectively.
The linear complexity of a sequence
can be efficiently calculated by the well-known Berlekamp–Massey algorithm [
15,
16], which also returns the linear feedback function generating
. In the Berlekamp–Massey algorithm, at each step
with the current linear recurrence for
, a discrepancy is calculated to assess whether the linear recurrence holds for the extended sequence
: when the discrepancy is zero, indicating that the current linear recurrence indeed holds for
, then move on to
n; when the discrepancy is nonzero, indicating that a new (and possibly longer) linear recurrence is needed, the linear recurrence is then updated and the linear complexity is updated as
Gustavson [
27] showed that the above process requires
multiplication/addition operations for calculating
. Dornstetter, in [
28], pointed out the equivalence between this procedure of calculation and the Euclidean algorithm.
Rueppel [
17] investigated the varying behavior of the discrepancy
,
, for a binary sequence of length
n, and characterized the linear complexity profile of random sequences in the following propositions.
Proposition 1. The number of sequences of length n and linear complexity l, denoted by , satisfies the following recursive relationand it, starting from , can be given by Proposition 2. The expected linear complexity of binary random sequences of length n drawn from a uniform distribution is given byand the variance of the linear complexity is given bywhere denotes n modulo 2. From the above proposition, we readily see that
where
From Rueppel’s characterization on the general stochastic behavior of binary random sequences, a similar and more important question arose: for a randomly chosen and then fixed sequence
over
, what is the behavior of
? To settle this question, Niederreiter [
18] developed a probabilistic theory for linear complexity and linear complexity profiles of a given sequence
over any finite field
. More specifically, he applied techniques from probability theory to dynamic systems and continued fractions, and deduced the probabilistic limit theorems for linear complexity by exploiting the connection between continued fractions and linear complexity of a sequence
. Below we first recall the connection discussed in [
18].
Given a sequence
, the fractional function
has the following continued fraction expression
where the partial quotients
are polynomials over
with positive degrees. The
n-th linear complexity of
can be expressed as
for the integer
j satisfying
By the study of the above continued fraction expansion, Niederreiter obtained the following result on .
Theorem 1. Suppose f is a non-negative nondecreasing function on the positive integers with . Then, we have By taking
for arbitrary
, a more explicit law of the logarithm for the
n-th linear complexity of a random sequence
can be obtained as follows.
As the equalities in (
6) hold for infinitely many
n, there is a strong interest in sequences
whose
n-th linear complexity has small derivations from
.
Definition 2. Let d be a positive integer. A sequence is said to be d-perfect ifA 1-perfect sequence is also called perfect. A sequence is called almost perfect if it is d-perfect for some integer . Constructing
d-perfect sequences is of significant interest. Sophisticated methods based on algebraic curves were introduced in [
29], which yielded sequences with almost perfect linear complexity profiles. For cryptographic applications, especially as a keystream, a sequence
should have a linear complexity profile close to that of a random sequence. Moreover, this condition should be true for any starting point of the sequence. That is to say, for a sequence
and any
, the
r-shifted sequence
should have an acceptable linear complexity profile close to random sequences as well. Niederreiter and Vielhaber [
30] attacked this problem with the help of continued fractions. An important fact is that the jumps in the linear complexity profile of
are exactly the degrees of the partial quotients
. With such a connection, they proposed an algorithm to determine the linear complexity profile of shifted sequences
by investigating the corresponding continued fractions [
31]. To be more concrete, they designed a method to calculate the continued fraction expansions of
, where
for
. Later, in his survey paper [
19], Niederreiter proposed the following open problem on
d-perfect sequences.
Problem 1. Construct a sequence over such that the r-shifted sequences for all are d-perfect for some integer .
The preceding discussion is mainly concerned with generic infinite-length sequences. Note that sequences from an m-stage LFSR over are periodic and their maximum period is , which is achieved when the characteristic polynomial is a primitive polynomial in . LFSR sequences of order m with period are thus known as maximum-length sequences, or m-sequences for short. For a periodic sequence , the values in its linear complexity profile will remain unchanged at a certain point. This reveals apparently different linear complexity profiles between an infinite-length sequence of certain period and a random infinite-length sequence, which basically can be considered to have an arbitrarily long period.
An important tool for the analysis of the linear complexity of
n-periodic sequences over
is the discrete Fourier transform. Assume
, which means that there exists an
n-th primitive root
of unity in some finite extension of
. Then the
discrete Fourier transform of a time-domain
n-tuple
is the (frequency-domain)
n-tuple
with
i.e.,
In this case, the linear complexity of an
n-periodic sequence can be determined via the discrete Fourier transform [
32].
Proposition 3. Let be an n-periodic sequence over with . Let β be an n-th primitive root of unity in an extension of and be the discrete Fourier transform of as in Equation (8). Then, Massey and Serconek in [
33] further extended the above relation to the general case
with the generalized discrete Fourier transform. The
generalized discrete Fourier transform of
, where
,
and
p is the characteristic of
, is defined as the following
matrix
where
is any
n-th primitive root of unity over
and
is the
t-th Hasse derivative of the polynomial
. It is clear that when
, the GDFT of
reduced to the discrete Fourier transform of
. As pointed out in [
33,
34], the linear complexity of
can be calculated in terms of the Günther weight of
, which was referred to as the Günther–Blahut theorem, which was used by Blahut implicitly in [
35].
Proposition 4. Let be an n-periodic sequence over with characteristic p, where with . Then, the linear complexity of is equal to the Günther weight of the GDFT of the n-tuple , where the Günther weight of a matrix M is the number of its entries that are nonzero.
Algebraically, the linear complexity of
can be obtained via the feedback polynomial of the LFSR generating
[
36]. Consider the generating function
Assume the characteristic polynomial of
is given by
with
. Then
Letting
, we have
where
and
is the minimal polynomial of
. This relation implies the following equality [
36]:
where
. Since
, it follows that
and
. Therefore, we have the following result.
Proposition 5. Let be an n-periodic sequence over and be the associated function of given by . Then the minimal polynomial of is given byindicating that With the above neat expression of
for periodic sequences
, several families of sequences with a nice algebraic structure were investigated, such as Legendre sequences [
37], discrete logarithm function [
38], Lempel–Cohn–Eastman sequences [
39]. For more information about linear complexity of sequences with special algebraic structures, readers are referred to Shparlinski’s book [
40].
The above discussion is concerned with the explicit calculation of an
n-periodic sequence. The statistical behavior of a random
n-periodic sequence over
is of fundamental interest, particularly from the cryptographic perspective. Rueppel in [
41] considered this problem: if we let
run through all
n-periodic sequences over
, what is the expected linear complexity of
For the case
, Rueppel showed that if
n is a power of 2, then
and if
with a prime
m, then
. Based on observations on numerical results, he conjectured that
is always close to
n. There was little progress on this conjecture until the work by Meidl and Niederreiter [
42], in which they studied
for an arbitrary prime power
q with the help of the above Günther–Blahut theorem and the analysis of cyclotomic cosets. For an integer
, the cyclotomic coset of
j modulo
w is defined as
.
Theorem 2. Let , where p is the characteristic of and . Let be the different cyclotomic cosets modulo w and for . Then From Theorem 2, a routine inequality scaling
implies that
In the particular case
,
For small
q, the above lower bounds can be further improved. For instance, for
, as the only singleton coset is
, it follows that
Recall that the expected linear complexity of a random binary sequence
is around
. For a periodic sequence
from a sequence
with linear complexity
l, the calculation of linear complexity of
needs to consider the sequence
. This expansion sequence, when considered as a random sequence, would have expected complexity around
. This is somewhat consistent with
when
for odd
n. Further improved results for the case
were obtained by more detailed analysis and can be found in [
42].
4. Quadratic Complexity
Quadratic feedback functions are the easiest nonlinear case and has been discussed by some researchers [
43,
44,
45]. Chan and Games, in [
43], studied the quadratic complexity of binary DeBruijn sequences of order
m (in which each binary string
appears exactly once); Youssef and Gong characterized a jump property of the quadratic complexity profile of random sequences, and Rizomiliotis et al., in [
45], proposed an efficient method to calculate the quadratic complexity of binary sequences in general. We first recall the definition of quadratic complexity of a sequence in the following:
Definition 3. Given a binary sequence , its quadratic complexity is the length of the shortest FSRs with quadratic feedback functionsthat can generate . The quadratic complexity profile of the sequence is similarly defined as , where is the quadratic complexity of the first n terms of . For the first
n terms
of
, suppose it can be generated by an
m-stage quadratic FSR. Following the recurrence as in Equation (
2), we can obtain a system of
equations. More concretely, denote
where, for
,
with
. According to the ordering in
and
, it is easy to verify that
Therefore, finding a quadratic feedback function in m variables that outputs the sequence is equivalent to solving the above system of linear equations in variables in . Notice that the ordering of linear and quadratic terms in has a feature that the quadratic terms with occur in only after the term occurs. This feature facilitates the calculation of term by term as in the Berlekamp–Massey algorithm.
We first give a toy example to illustrate the above linear equations. Let
and
. Then we have
and
From the relation , we obtain 6 equations with 7 variables in .
It is straightforward to see that the system is solvable if and only if
Chan and Games in [
43] had the following observation.
Theorem 3. If an m-stage quadratic FSR can generate but not , then there is no m-stage quadratic FSR that can generate if and only if By this theorem, for each step, when
has quadratic complexity
m, if
then
for any
also has the same quadratic complexity
m. Based on such an observation, they proposed a term-by-term algorithm, similarly to the Berlekamp–Massey algorithm, to calculate the quadratic complexity of a sequence
[
23]. Their algorithm strongly depends on the Gaussian elimination for the computation of
for each new term. The special structure of the matrix
was not well taken into consideration in their algorithm, which did not reveal the precise increment of the quadratic complexity of
. By looking into certain structure of the matrix
, Youssef and Gong [
44] showed that if the quadratic complexity of the first
n terms of a sequence
, is larger than
, then the first
terms of
has the same quadratic complexity. Rizomiliotis, Kolokotronis and Kalouptsidis in [
45] observed the following nesting structure of the matrix
:
where
is the last row in
,
is the starting row of
, and
contains all the columns of
with indices of the form with indices of the form
, with
, including the starting all one column
in
. This nesting structure played an important role in the their derivation of the following results.
Theorem 4. Suppose . Then, is the smallest integer m such that Theorems 3 and 4 laid the foundation for an algorithmic method to calculate the quadratic complexity of a sequence. More specifically, for each new term, Theorem 3 determines when the quadratic complexity will increase; and Theorem 4 characterizes how large the increment is. Combining different cases for each new term,
Figure 2 provides a preliminary algorithm to recursively assess the quadratic complexity of a sequence
. It can be seen from
Figure 2 that the assessment heavily depends on the calculation of ranks of involved matrices, which becomes slower as
n and
m grow.
With a detailed analysis of the nesting structure of
, the authors in [
45] proposed a more efficient algorithm (as in [
45], Figure 3) to calculate the quadratic complexity profile of
. In addition, when
, the system
is under-determined; and when
, the necessary and sufficient condition that a unique quadratic feedback function generating the sequence
can be given [
45].
Theorem 5. The quadratic feedback function of an m-stage FSR that generates the sequence is unique if and only ifOtherwise, there are such functions. This theorem illustrates that a binary sequence with small quadratic complexity m should not be used for cryptographic applications. Otherwise, when consecutive components ’s in are known, the quadratic feedback function could be (uniquely) determined, thereby producing the whole sequence and violating the requirement of unpredictability for sequences used in cryptography.
In the previous section, we saw a good theoretical understanding of the statistical behavior of linear complexity and linear complexity profile for random sequences. However, to the best of my knowledge, there is no published theoretical result on the statistical behavior for quadratic complexity and the quadratic complexity profile of random sequences
and
n-periodic sequences
, except for the following two conjectures by Youssef and Gong [
44] strongly indicated by numerical results.
Conjecture 1. Let be the number of binary sequences with quadratic complexity . Then for any , indicating that is a function of .
Conjecture 2. For moderately large n, the expected value of the quadratic complexity of a random binary sequence of length n is given by 5. Maximum-Order Complexity
As an additional figure of merit for randomness testing, Jansen and Boekee in 1989 proposed the notion of maximum-order complexity, also later known as nonlinear complexity, of sequences [
46]. We adopt the term maximum-order complexity in this survey for better distinction from the notion of quadratic complexity.
Definition 4. The maximum-order complexity of a sequence , denoted by , is the length of the shortest FSRs that can generate the sequence . Similarly, the maximum-order complexity profile of is a sequence of where for each is the shortest FSRs that can generate the first n terms of .
As pointed out in ref. [
46], the significance of the maximum-order complexity is that it tells exactly how many terms of
have to be observed at least in order to be able to generate the entire sequence by means of an FSR with length
. Therefore, it has been considered as an important measure to judge the randomness of sequences. Below we recall some basic properties of maximum-order complexity of sequences [
46,
47].
Lemma 1. Let be a sequence over .
(i) Let l be the length of the longest subsequence of that occurs at least twice with different successors. Then the sequence has maximum-order complexity ;
(ii) The maximum-order complexity of any n-length sequence is at most . Moreover, the equality holds if and only if has the form with in .
Lemma 2. Let be a sequence of period n over .
(i) The maximum-order complexity of satisfies ;
(ii) The reciprocal sequence has .
From Lemmas 1 and 2, the difference between nonlinear complexities of finite-length sequences and periodic sequences is apparent. One typical difference is that the maximum-order complexity of a periodic sequence remains unchanged under cyclic shift operations, while that of a finite-length sequence can change dramatically. For instance, for the sequence of length n, while , after a right cyclic shift, we have but .
Recall that for the case where the linear feedback function is unique for periodic sequences, and that for the quadratic feedback function is also unique when the matrix with is nonsingular, the situation for maximum-order complexity is significantly different.
Proposition 6. Let denote the class of feedback functions of the FSRs that can generate a periodic sequence with maximum-order complexity m over . Then the number of functions in is equal to
By the above proposition, the class
generally contains more than one feedback function that can generate the periodic sequence, which is similar for non-periodic sequences. One can search for functions exhibiting certain properties, such as the function with the least number of terms. One of the methods that minimize the number of terms and their orders in the inclusive-or sum of products of variables or their complements is to use the disjunctive normal form (DNF) representation of the feedback function. For the binary case, the DNF of
f is given by
where
if and only if
for
. It is also interesting to note that a unique feedback function occurs for DeBruijn sequences of order
m, which have
m-tuples over
exactly once in one period
. For binary DeBruijn sequences of order
m, Chan and Games in [
23] showed their quadratic complexity are upper bounded by
, which can be reached by those DeBruijn sequences derived from m-sequences by inserting 0 to the all-zero subsequence of length
in m-sequences.
5.1. Computation and Statistical Behavior
In ref. [
46] Jansen associated
with the maximum depth of a direct acyclic word graph (DAWG) from
. Below, we recall a toy example from [
46] to generate a DAWG from a binary string, which helps readers better understand relevant notions.
Example 1. The sequence has the following set of all subsequenceswhere λ represents the empty sequence. For the sequence , the endpoints are given as in Table 1. For the sub-sequences of , their endpoints are given as below Subsequences are considered equivalent if they have the same set of endpoints. For instance, the following are three equivalent classes with endpoints 4, 5 and 6, respectively: Therefore, the total 17 subsequences in are divided into 9 equivalence classes. It is customary to choose the shortest subsequence as the class representative. In this way, we have the following representatives: From the above discussion, we can build a DAWG as follows: each representative in is denoted by a node and λ is considered as the source node; there is a directed edge between one node to another node if and only if the equivalence class of contains a subsequence, which when extended with the edge symbol belongs to the equivalence class of ; the edges are divided into primary and secondary edges: an edge is primary if and only if it belongs to a primary path (the longest path from the source to the node) and a depth of a node is the length of the primary path to the node, where the length of a path is the number of edges along the path. Let be a subset consisting of nodes with more than one outgoing primary edges in and define the maximum depth .
In this way, we obtain the DAWG of the sequence as in Figure 3, where primary edges are displayed in solid arrows. From the definition of the depth of a node, we haveSince , we have
Figure 3.
Direct acyclic word graph of
[
21].
Figure 3.
Direct acyclic word graph of
[
21].
With the notions illustrated in Example 1, Jansen in [
21] established the following connection between the maximum-order complexity of a sequence
and the depth of DAWG derived from
, and proposed to calculate the maximum-order complexity of
from its DAWG.
Proposition 7. Given a sequence , the maximum-order complexity of and the depth of its DAWG satisfy Blumer’s algorithm [
48] was identified as an important tool to build a DAWG of a sequence
, thereby calculating its maximum-order complexity profile in linear time and memory with regard to the sequence length. The method worked particularly well for periodic sequences. With this algorithm, Jansen exhausted all
binary
l-length sequences and
l-periodic sequences as
l ranges from 1 up to 24 ([
21], Tables 3.1–3.4), which exhibited typical statistical behaviors of maximum-order complexity profiles of random sequences: the expected maximum-order complexity of a random sequence
of sufficiently large length
n over
is given by
Following the successful approach in ref. [
45], Rizomiliotis and Kalouptsidis [
49] considered the calculation of the maximum-order complexity of a sequence
in a similar way. From the recursive equations
for
, one can obtain a similar system to linear equations
where the coefficient matrix
from all binary terms
, where
is any element in
in a special ordering according to the degree of the terms. For instance, when
and
, the system of linear equations is given by
with
In ref. [
49], the authors investigated the properties of the
m-tuples
as
j runs through the indices
and proposed a term-by-term algorithm to compute the maximum-order complexity and output a feedback function for a given sequence
. They also pointed out that the dominant multiplication complexity in their proposed algorithm is in the order of
.
Attempting the above approach by analyzing the structure of
is not satisfactory enough. Later, in ref. [
50], the authors made more observations that further improved the performance of calculating maximum-order complexity of sequences.
Proposition 8. For a binary sequence , suppose and the minimal FSR of does not generate . Then if and only if the subsequence has not appeared within .
The above observation is a natural extension of the property of maximum-order complexity in Lemma 1 (i). Instead, the following observation characterized the jump of maximum-order complexity profile at each term.
Proposition 9. For a binary sequence , suppose . Let be the least integer such that for certain . Then Moreover, if for , then the sequence by arbitrary extension has the same maximum-order complexity .
In addition, they observed that the maximum-order complexity profile of a sequence has a close connection to its eigenvalue profile. To be more concrete, the eigenwords in a sequence are those subsequences of that are not contained in any proper subsequence of . They showed the following interesting connection.
Theorem 6. For a binary sequence , suppose and the minimal FSR of does not generate . Then,where is the number of eigenwords in the sequence . From the observations on the update of certain feedback function when the minimal FSR of does not generate , they proposed the following procedure to generate a minimal feedback function and maximum-order complexity of a sequence . Suppose and the minimal feedback function of is . Then,
if , then and share the minimal feedback function and the same maximum-order complexity m
if
and
, then update
as
if
and
, then
Consequently, they proposed an efficient algorithm for maximum-order complexity, which works very similarly to the Berlekamp–Massey algorithm. For ease of comparison with the Berlekamp–Massey algorithm, we recall it in Algorithm 1. The computational complexity of Algorithm 1 in the worse case is
, while in the average case is
since the the expected maximum-order complexity
. While it has the similar complexity as the DAWG method [
21], its recursive nature is an important advantage since it eliminates the need to know the entire sequence in advance. This resembles the well-known Berlekamp–Massey algorithm in the linear case: if a discrepancy occurs at certain step, then a well-determined corrective function is added to the current minimal feedback function of
to provide an updated minimal feedback of
. In addition, the update of maximum-order complexity
at each step is similar to
for linear complexity of the sequence
in the recursive procedure.
Algorithm 1: Generation of minimal polynomial for . |
|
5.2. Approximate Statistical Behavior
Conducting a rigorous theoretical analysis of the behavior of maximum-order complexity profiles, as was done for linear complexity, appeared intractable. On the other hand, maximum-order complexity had a lot of similarities as linear complexity. In order to facilitate the randomness test with maximum-order complexity, Erdman and Murphy proposed a method to approximate the distribution of the maximum-order complexity [
51]. Inspired by the property of maximum-order complexity in Lemmas 1 and 2, they investigated a function that approximates the function
: the probability that the first
nm-tuples are all different in a sequence.
Proposition 10. Let be the probability that the n-th m-tuple is unique given that the previous m-tuples were unique. Then, Simulations on random sequences for
m from 4 to 24 ([
51], Table 2) indicated that the above approximation was accurate, particularly when
.
Recall that a purely periodic sequence has maximum-order complexity
m if the first
nm-tuples are unique but at least one of the first
n-tuples is repeated. Thus, for calculating a periodic sequence
, it suffices to only look at the first
bits to see if the
nm-tuples are unique and the first
bits to see if the
n-tuples are not unique. Denote by
the probability that the first
nm-tuples are unique while the
n-tuples are not. This probability function can be well approximated by
, which is given by
Based on this approximation (of which the accuracy was shown in ([
51], Table 3)), the expected maximum-order complexity was approximated as follows:
Theorem 7. Let be the maximum-order complexity of a random periodic binary sequence . Then, Similarly, for random binary sequences of length n, the expected maximum-order complexity was approximated below.
Theorem 8. Let be the number of binary sequences of length n with maximum-order complexity m. Then, it can be approximated asand the expected maximum-order complexity of random binary sequence of length n is given by 5.3. Sequences with High Maximum-Order Complexity
Different complexity measures were proposed in the literature to evaluate the randomness of sequences. A sequence with low complexity, including the linear, quadratic and maximum-order complexity, allows for short FSRs to generate the whole period of the sequence. That is to say, all remaining unknown terms in this sequence can be efficiently uncovered when the feedback function and the initial state of the short FSRs are determined. It is clear that this kind of sequences should be avoided for any cryptographic applications. On the other hand, the relation between high complexity of a sequence and good randomness of a sequence is not yet well understood.
For the aforementioned complexity measures, the sequences of the form
have the largest complexity, but have clearly poor randomness. According to the exhaustive search results on maximum-order complexity in Tables 1–3 in [
21], while sequences
are the only instances of
n-length sequences that have the highest possible complexity
, there are multiple
n-periodic binary sequences having the highest maximum-order complexity
.
Researchers have been interested in the study of sequences with high maximum-order complexity [
52,
53], particularly on sequences having highest possible maximum-order complexity [
54,
55]. For the latter case, Rizomiliotis [
49] in 2005 proposed the following necessary and sufficient condition for an
n-periodic binary sequence to have maximum-order complexity the same as its linear complexity.
Proposition 11. For a periodic binary sequence , let be the polynomial representation of , and let , and . Then,if and only if there exists integers satisfyingwhen . Based on the above condition, different families of
n-periodic binary sequences that have the same linear complexity and maximum-order complexity were proposed in ref. [
49]. Moreover, an algorithm based on Lagrange interpolation and an algorithm based on the relative shift of the component sequences were proposed to generate binary sequences with highest possible maximum-order complexity. These two ideas were further developed for constructing such periodic sequences with period of the form
[
54]. Roughly 10 years later,
n-periodic sequences with highest maximum-order complexity were revisited in ref. [
55], which provided a complete picture of sequences with maximum-order complexity. The authors of ref. [
55] gave the necessary and sufficient condition for
n-periodic sequences over any alphabet to have maximum-order complexity
.
Theorem 9. Let be an n-periodic sequence over any field . Then if and only if there exists an integer p with such that for and for . Furthermore, such a sequence can, up to shift equivalence, be represented as one of the following forms
(1) where ;
(2) for certain integer withandwhere the integers for are derived from with , With the full characterization in Theorem 9, the distribution of random n-periodic binary sequences having maximum-order complexity can be derived as follows.
Proposition 12. The probability that a randomly generated sequence has the highest is given bywhere is the Euler’s totient function and is the Möbius function given by . In particular, when , the probability Interestingly, the sequences
characterized in Theorem 9 exhibit a strong recursive structure, which can be derived by applying the Euclidean algorithm on
n and the integer
p, a smaller period of a subsequence of
. This strong recursive structure, on the other hand, implies that
is far from being random. As discussed in ref. [
55], the balancedness, stability and scalability of such sequences are not good, indicating that they should be avoided for cryptographic applications. Very recently, binary sequences with periodic
n were further studied in ref. [
56], where the authors further investigated the structure of
and proposed an algorithmic method to determine all
n-periodic binary sequences with maximum-order complexity
.
In addition, Liang et al. recently investigated the structure of
n-length binary sequences with high maximum-order complexity
and proposed an algorithm that can completely generate all those binary sequences [
53]. Based on the completeness, they managed to provide an explicit expression of the number of
n-length binary sequences with any maximum-order complexity between
and
n.
Proposition 13. The number of n-length binary sequences with maximum-order complexity m with is given bywherewhen is factorized as .