1. Introduction
De Bruijn sequences have been investigated for decades [
1,
2,
3,
4]. They have many applications: in combinatorial problems, in cryptography to generate pseudo-random sequences, and in biology to investigate genome sequences [
5]. It is known that binary de Bruijn sequences can be generated by Nonlinear Feedback Shift Registers (NLFSRs) [
6]. NLFSRs are the main components in constructing stream ciphers. Knowing a de Bruijn sequence, one can apply the cross-join method to construct new de Bruijn sequences [
3,
7,
8,
9]. In papers [
10,
11] It was proved first that the cross-join method generates all de Bruijn sequences of given order. In [
10], an algorithm was explicitly given that begins with an arbitrary de Bruijn sequence from a finite alphabet and outputs a Hamiltonian path in the corresponding cross-join graph.
Paper [
9] generalizes the notion of de Bruijn sequences to multi de Bruijn sequences, where patterns of fixed length appear
m times (
m = 1 for ordinary de Bruijn sequences), that paper presents formulas for the total number of possible multi de Bruijn sequences with a specified set of parameters. However, it does not provide a method to generate any such sequence. Although the notion of multi de Bruijn sequences appears to be more complex than ordinary ones and may cater to more applications (they appear in some biological sequences investigations [
5],) one important consequence of the results of this paper is that all multi de Bruijn sequences stem, simply, from any ordinary de Bruijn sequence.
Following the proof given by Alhakim [
10], we prove that all multi de Bruijn sequences can be generated starting from one such sequence by using the cross-join method. The proof is non-constructive in the sense that one has to start with a particular multi de Bruijn sequence in order to apply the cross-join method. On the positive side, it is sufficient to form a trivial multi de Bruijn sequence by concatenating an ordinary de Bruijn sequence
m times with itself. Ordinary de Bruijn sequences can be constructed using various methods [
7,
12,
13,
14] (see also, [
3] and references therein). We implemented this method for the case of multi de Bruijn sequences of the type
and
(binary multi de Bruijn sequences of order 2 and 3 with multiplicity 2 and patterns of length 2 and 3, see below for a formal definition). The drawback of the cross-join method is that to find a cross-join pair, we potentially have to run over the whole sequence. This can still be reasonably done for sequences up to the order of 40.
Galois NLFSRs were considered in papers [
15,
16]. We confirmed experimentally that some sequences of type
can be generated by Galois NLFSRs listed in [
16]. In fact, they are modified sequences where one of the patterns has a lower multiplicity. It is an open problem whether all binary multi de Bruijn sequences can be generated using suitable Galois NLFSRs.
2. Multi de Bruijn Sequences
We introduce multi de Bruijn sequences following Tesler’s paper [
9]. Let
be a totally ordered alphabet of size
. A linear sequence is an ordinary sequence of elements of
denoted
. Define the
cyclic shift of a linear sequence by
In a cyclic sequence, we treat all rotations of a given linear sequence as equivalent. A
k-
is a sequence of length
k over
. The set of all
k-
over
is
. A
de Bruijn sequence is a cyclic sequence over alphabet
in which all
k-
occur exactly once. The length of such a sequence is
.
Definition 1. A multi de Bruijn sequence is a cyclic sequence over an alphabet Ω of size q in which each k- occurs exactly m-times with . k is the order of the sequence.
Let
denote the set of all such sequences. The length of such a sequence is
, since each of the
k-
accounts for
m starting positions. Tesler [
9] derived the formula for the cardinality of
. In the following, we consider multi de Bruijn sequences over an alphabet
with
q symbols
, addition modulo
q is used.
Definition 2. Let a sequence be represented as a sequence of its states , where each state is a k-mer . It is conjugate to a state if and . We denote this . The state is a companion of the state if and
Definition 3. Two pairs of vertices that allow the transformation of a de Bruijn cycle to another de Bruijn cycle are called cross-join pairs. Let a multi de Bruijn sequence be considered a cyclic sequence and represented as a sequence of states . Then the four states , form a cross-join pair for the sequence if they occur in the sequence in the listed order.
Definition 4. Let and be a cross-join pair. We construct a new multi de Bruijn sequence by swapping the successors of and and the successors of and . That is, by going from to the successor of , then from to the successor of and so on until closing the cycle. This construction is called the cross-join method.
To be more precise, let us denote
and
. Then the original sequence has states that proceed as:
After the cross-join operation, the modified sequence has states that proceed as:
The conjugate pair of states
splits the full cycle into two shorter cycles after interchanging their successors. Then the states
are on different cycles, and after interchanging their successors, we obtain a new de Bruijn cycle (see
Figure 1).
3. The Main Theorem
Definition 5. Let be two sequences from . The length of the sequences is . We take the least lexicographical representatives of both sequences and consider the length L of the longest common initial path of these sequences We define the function (pseudo-distance) of the sequences as .
Proposition 1. The function has the properties:
For all in if and only if
One can find examples of three multi de Bruijn sequences, which are concatenations of de Bruijn sequences of lower order for which the triangle inequality is not satisfied. It turns out that the pseudo-distance suffices to get our connectedness result. Unlike the case of ordinary de Bruijn cycles, the absence of the triangle inequality does not allow for the construction of a Hamiltonian path of multi de Bruijn cycles.
Definition 6. Let x and y be two distinct multi de Bruijn sequences. We say that y is a neighbor of x if y can be obtained from x by applying a sequence of cross-join operations.
We adapt the following proposition and its proof from the paper [
10].
Proposition 2. Let and be two distinct multi de Bruijn sequences from the set . Then there exists a multi de Bruijn sequence , which is a neighbor of x in such that .
Proposition 2 is crucial in the proof of following.
Theorem 1. Any two distinct multi de Bruijn sequences in can be connected by applying a sequence of the cross-join operations.
Proof. Let x and y be two distinct sequences in . By Proposition 2, x has a neighbor such that . If , then we are done; otherwise, the same argument can be iterated to get a sequence , which is a neighbor of , with . Due to the strict inequality, and since the number of sequences in is finite, it is evident that this iterative process must end at y after a finite number of steps l, leading to the desired path . □
Proof of Proposition 2. Let
x and
y be state sequences of multi de Bruijn sequences. We take the least lexicographical representatives of the sequences
and
, where
and
are successive states of the multi de Bruijn sequences. Let
be the maximal common initial sequence of
x and
y. Suppose that the sequence
is common to
x and
y and
is maximal, where
0 is the state of zeros
. Since
and for the successors of
in
x and
y at least one is distinct from the state
0. Let us refer to these successors as
and
. Since
x is a multi de Bruijn sequence, it contains every state
m times, so it must contain
. The latter is at least one of the states in
, the complement of
in
x; that is, the subsequence of
x that starts with
and extends till the end of the sequence, just before cycling back to
. Let
be the predecessor of the first occurrence of
in
x. Since
belongs to
, the state
is either in
or it is
itself. However, the latter would make the common initial sub-sequence of
x, and
y would extend to
, which contradicts the maximality of
. Now
and
are predecessors of the same state so they form a conjugate pair. Swapping their successors, we split
x into two cycles, a cycle
that includes the initial subsequence
, and another cycle
that includes the edge
.
The cycle
aligned to start with the initial subsequence
and the multi de Bruijn cycle
y have a maximal common initial sequence of states
where
. Let
be a complement of
in
. The rest of the proof depends on establishing the following. □
Claim 1. It is possible to join and by using a state in and a conjugate state in , i.e., there is a state in that has a conjugate in
To show this, suppose we cannot. Then let the successors of in y and be and , respectively. Obviously, since is common to y and and since is not on the path of y, there is at least one occurrence of the word in , the complement of in , as it cannot be on , by our assumption. Let be the predecessor of in . As before, we can argue that is in .
Interchanging the successors of
and
, we further split the cycle
into two cycles
and
with the former being the cycle that includes the initial subsequence
and that shares a larger still initial path with
y:
In essence, this process can be iterated, arranging and re-arranging vertices on the initial cycle
but without using vertices
, only a finite number of times. Let
k be the maximal number of iterations and let
be the resulting cycle that includes
with maximal initial path
that is common with the multi de Bruijn sequence
y. Under the assumption that the above claim is not valid, we prove the following: the sub-path of the cycle
that begins with
and ends with
0 is simply an edge
. That is,
is the last vertex before rounding back to
.
To see this, suppose that is the successor of in . Let be the successor of in y, so that and are companion vertices. We then see that is not the last vertex in y for otherwise, the multi de Bruijn sequence y would be shorter than . Hence, is not the initial of . Consequently, one occurrence of is either in or in the part of . If the first case is true, swapping the predecessor of in with the predecessor of (which is evidently one of the vertices of ) shows that and can be joined into multi de Bruijn sequence using a vertex outside , contradicting the original assumption of the Claim.
If the second case is true, that is, if belongs to or any of the cycles made by the previous iteration and that are at most (equivalently, it is one of the vertices of ), then we can swap the predecessors of and to get yet another cycle that shares a longer initial segment with y, contradicting the maximality of . It follows that is the last vertex in .
We now prove the following: includes all predecessors of 0. We prove this in a way similar to the proof of the previous statement. In effect, suppose that U is a predecessor of 0 that is not on . If U belongs to , we get a contradiction because we could have joined and by swapping, for example, the successors of U and the last vertex in before the initial subsequence , which is, of course, in . Likewise, the presence of U on any of the intermediate cycles contradicts the maximality of k.
The validity of this last means that the sequence cannot be continued into a multi de Bruijn sequence as it cannot cycle back to 0 without using one of the predecessors of zero. This, of course, is not true because is already the initial path of the multi de Bruijn sequence y. This contradiction means that the Claim must be true.
We have thus proven that
and
can be joined by swapping the successors of a vertex in
with that of a conjugate vertex in
. This makes a new multi de Bruijn sequence
z which is a neighbor of
x. Since
and
z satisfies the inequality
as desired.
Proposition 3. Starting with a multi de Bruijn sequence in and applying the cross-join method generates all sequences in .
We present now the formula for the number of elements of
[
9].
where
is the Euler totient function and
where
m is repeated
m times. We calculate
We have implemented the cross-join method for the multi de Bruijn sequences of the type
and
. The implementation has been done in SAGE [
17]. For each sequence, the succeeded states are represented as decimals, and the sequence representative is the least lexicographical one. We have started from the first sequences in the list in
Table 1 and generated all sequences. First, we find the cross-join pairs from the chosen sequence and generate the corresponding sequences. Then we choose a new de Bruijn sequence and repeat the process. We then check whether all the sequences are different and throw away the repeated ones. After a few steps of this process, we find all sequences of a given type.
The List. The feedback functions of NLFSRs generated the sequences shown in
Table 2. + is understood as modulo 2 addition. This table is a part of the
Table 3 of Dubrova et al. [
16].
# | | | | |
1 | | | | |
2 | | | | |
3 | | | | |
4 | | | | |
5 | | | | |
6 | | | | |
7 | | | | |
8 | | | | |
9 | | | | |
10 | | | | |
11 | | | | |
12 | | | | |
13 | | | | |
14 | | | | |
15 | | | | |
16 | | | | |
17 | | | | |
18 | | | | |