Next Article in Journal
Statistical Mechanics of Directed Networks
Previous Article in Journal
Image Encryption Method Based on Three-Dimensional Chaotic Systems and V-Shaped Scrambling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Some New Constructions of q-ary Codes for Correcting a Burst of at Most t Deletions †

Science, Mathematics and Technology (SMT) Cluster, Singapore University of Technology and Design, Singapore 487372, Singapore
*
Author to whom correspondence should be addressed.
Part of the results of this article has been presented at the 2024 IEEE International Symposium on Information Theory, Athens, Greece, 7–12 July 2024.
Entropy 2025, 27(1), 85; https://doi.org/10.3390/e27010085
Submission received: 17 December 2024 / Revised: 9 January 2025 / Accepted: 15 January 2025 / Published: 18 January 2025
(This article belongs to the Special Issue Coding Theory and Its Applications)

Abstract

:
In this paper, we construct q-ary codes for correcting a burst of at most t deletions, where t , q 2 are arbitrarily fixed positive integers. We consider two scenarios of error correction: the classical error correcting codes, which recover each codeword from one read (channel output), and the reconstruction codes, which allow to recover each codeword from multiple channel reads. For the first scenario, our construction has redundancy log n + 8 log log n + o ( log log n ) bits, encoding complexity O ( q 7 t n ( log n ) 3 ) and decoding complexity O ( n log n ) . For the reconstruction scenario, our construction can recover the codewords with two reads and has redundancy 8 log log n + o ( log log n ) bits. The encoding complexity of this construction is O q 7 t n ( log n ) 3 , and decoding complexity is O q 9 t ( n log n ) 3 . Both of our constructions have lower redundancy than the best known existing works. We also give explicit encoding functions for both constructions that are simpler than previous works.

1. Introduction

The study of deletion/insertion correcting codes, which originated in the 1960s, has made a great progress in recent years, encouraged by their application to DNA-based data storage. One of the basic problems in this area is construction of codes with low redundancy and low encoding/decoding complexity, where the redundancy of a q-ary ( q 2 ) code C of length n is defined as n log q | C | , measured in q-ary symbols, or ( n log q | C | ) log q , measured in bits (in this paper, for simplicity, for any real x > 0 , we write log 2 x = log x ). The optimal redundancy of a q-ary t-deletion correcting code of length n was proved to be asymptotically between t log n + t log q + o ( log q n ) and 2 t log n + t log q + o ( log q n ) [1]. In general, codes with a redundancy matching the upper bound can be constructed by graph-coloring method. However, the encoding complexity of such a construction is exponential in n. In practice, the construction of codes with polynomial encoding complexity (also called explicit construction) and low redundancy is an interesting research problem.
The famous VT codes were proved to be a family of single-deletion correcting binary codes, and are asymptotically optimal in redundancy [2]. The VT construction was generalized to nonbinary single-deletion correcting codes in [3] and, recently, a different version of nonbinary VT codes was proposed in [4] using differential vector, with asymptotically optimal redundancy and efficient encoding/decoding. Works on binary and nonbinary codes for correcting multiple deletions can be found in [5,6,7,8,9,10,11,12,13] and the references therein.
Burst deletions and insertions, which means that deletions and insertions occur at consecutive positions in a string, are a class of error that can be found in many applications, such as DNA-based data storage and file synchronization. For the binary case, the maximal cardinality of a t-burst-deletion correcting code (i.e., a code that can correct a burst of exactly t deletions) is proved to be asymptotically upper bounded by 2 n t + 1 / n [14], so its redundancy is asymptotically lower bounded by log n + t 1 . Several constructions of binary codes correcting a burst of exactly t deletions were reported in [15,16], where the construction in [16] achieves an optimal redundancy of log n + ( t 1 ) log log n + k log k . A more general class, i.e., codes correcting a burst of at most t deletions, were also constructed in the same paper [16], and this construction was improved in [17] to achieve a redundancy of log t log n + ( t ( t + 1 ) / 2 1 ) log log n + c t for some constant c t that only depends on t. In [18], by using VT constraint and shifted VT constraint in the so-called ( p , δ ) -dense strings, binary codes correcting a burst of at most t deletions were constructed, with an optimal redundancy of log n + t ( t + 1 ) / 2 log log n + c t , where c t is a constant depending only on t.
In recent parallel works [12,19], q-ary codes correcting a burst of at most t deletions were constructed for even integer q > 2 , with redundancy log n + O ( log q log log n ) or, more specifically, log n + ( 8 log q + 9 ) log log n + γ t + o ( log log n ) bits for some constant γ t that only depends on t. The basic techniques in [12,19] are to represent each q-ary string as a binary matrix whose columns are the binary representation of the entries of the corresponding q-ary string. Strings of length n are constructed such that the first row of their matrix representation is ( p , δ ) -dense. Then, the first row of their matrix representation is protected by a binary burst deletion correcting code of length n and the other rows are protected by binary burst deletion correcting codes of length not greater than 2 δ , which results in a construction with log n + O ( log q log log n ) bits of redundancy. A different construction of q-ary codes correcting a burst of at most t deletions was reported in a more recent work [4], which has redundancy log n + O ( t 2 log log n ) + O ( t log q ) .
A relaxed model of error correction, called sequence reconstruction, also received great attention from researchers (e.g., see [20,21,22,23,24,25,26,27,28,29,30,31,32]). Unlike the classical error correcting codes, in the sequence reconstruction model, the receiver is allowed to reconstruct the original transmitted sequence from multiple noisy reads (channel outputs). This model is suitable for DNA data storage because current synthesis and sequencing technologies can generate many (possibly erroneous) reads for each DNA strand, and so each stored DNA strand can be recovered by its many erroneous copies. Sequence reconstruction for deletion, insertion, transposition, and substitution was first studied in [20,21], where the minimum number of reads for exact reconstruction of uncoded sequence was computed. Coded sequence reconstruction for deletion channel was considered recently in [23], where it was assumed that a codeword of a single-deletion-correcting code is transmitted over the t-deletion channels, and the minimum number of distinct reads required to uniquely reconstruct the transmitted sequence was computed. The more general problem, i.e., the minimum number of reads for reconstruction of a codeword of an ( 1 ) -deletion-correcting code of length n transmitted over the t-deletion channels for some 1 t < n , was solved in [27]. The dual problem, i.e., designing codes (called reconstruction codes) for reconstruction of a sequence with fixed number of reads for deletion channel, was also considered in recent years. A construction of binary reconstruction codes for two reads and with log log n + O ( 1 ) bits of redundancy under single-deletion channel was presented in [24], and this construction was generalized in [28] to q-ary single-edit channel for q 2 . Binary reconstruction codes under 2-deletion channel were constructed in [29,30]. It was shown in [30] that 3 log n + o ( log n ) bits of redundancy is sufficient for two reads, and log n + o ( log n ) bits of redundancy is sufficient for five reads. Reconstruction codes under single-burst-insertion/deletion/edit channel were considered in [31], where for the channel suffering from a single burst of at most t deletions, a family of q-ary codes for two reads with t ( t + 1 ) / 2 log log n + O ( 1 ) bits of redundancy were constructed.
In this paper, we propose some new constructions of q-ary codes for correcting a burst of at most t deletions for any fixed positive integers t and q 2 . We consider both the classical error correcting codes and the reconstruction codes. In our constructions, we consider q-ary ( p , δ ) -dense strings (sequences), which are generalization of the binary ( p , δ ) -dense strings defined in [18], and give an efficient algorithm for encoding and decoding of q-ary ( p , δ ) -dense strings. For the classical burst-deletion correcting codes, a VT-like function is used to locate the deletions within an interval of length not greater than 3 δ , which results in log n bits of redundancy. In addition, two functions are used to recover the substring destroyed by deletions, which results in 8 log log n + o ( log log n ) bits of redundancy (in this paper, the term o ( log log n ) may depends on q and t. However, since q and t are assumed to be fixed positive integers, they are omitted). Thus, the total redundancy of our construction is log n + 8 log log n + o ( log log n ) bits. Compared to previous works, the redundancy of our new construction is independent of q and t in the second term. An explicit encoding function is given, which is simpler than previous works and has complexity O ( q 7 t n ( log n ) 3 ) . The decoding complexity is O ( n log n ) .
We also construct reconstruction codes for correcting a burst of at most t deletions from two reads and with redundancy 8 log log n + o ( log log n ) bits, which is lower than the construction in [31]. We give an explicit encoding function for such codes with encoding complexity O q 7 t n ( log n ) 3 and decoding complexity O q 9 t ( n log n ) 3 .
In Section 2, we introduce related notations and concepts, and present some basic constructions that will be used in our new constructions. In Section 3, we study ( p , δ ) -dense q-ary strings. A new construction of classical q-ary burst-deletion correcting codes is given in Section 4, and q-ary codes for correcting burst deletions from two reads is given in Section 5. Finally, the paper is concluded in Section 6.

2. Preliminaries

Let [ m , n ] = { m , m + 1 , , n } for any two integers m and n, such that m n , and call [ m , n ] an interval. If m > n , then let [ m , n ] = Ø . For simplicity, we denote [ n ] = [ 1 , n ] for any positive integer n. The size of a set S is denoted by | S | .
Given any integer q 2 , let Σ q = { 0 , 1 , 2 , , q 1 } . For any sequence (also called a string or a vector) x Σ q n , n is called the length of x , and denote | x | = n . We will denote x = ( x 1 , x 2 , , x n ) or x = x 1 x 2 x n . For any set I = { i 1 , i 2 , , i m } [ n ] such that i 1 < i 2 < < i m , denote x I = x i 1 x i 2 x i m , and call x I a subsequence of x . If I = [ i , j ] for some i , j [ 1 , n ] such that i j , then x I = x [ i , j ] = x i x i + 1 x j is called a substring of x . We say that x contains p ( or p is contained in x ) if p is a substring of x . For any x Σ q n and y Σ q n , we use x y to denote their concatenation, i.e., x y = x 1 x 2 x n y 1 y 2 y n . We also use notations such as x 0 , x 1 , , x k to denote substrings of a sequence x . For example, the notation x = x 1 x 2 x k means that the sequence x consists of k substrings x 1 , x 2 , , x k .
Let t n be a nonnegative integer. For any x Σ q n , let D t ( x ) denote the set of subsequences of x of length n t , and let B t ( x ) denote the set of subsequences y of x that can be obtained from x by a burst of t deletions, that is, y = x [ n ] D for some interval D [ n ] of length t (i.e., D = [ i , i + t 1 ] for some i [ n t + 1 ] ) . Moreover, let B t ( x ) = t = 0 t B t ( x ) , i.e., B t ( x ) is the set of subsequences of x that can be obtained from x by a burst of at most t deletions. Clearly, B 1 ( x ) = D 1 ( x ) and B t ( x ) D t ( x ) for t 2 .
A code C Σ q n is said to be a t-deletion correcting code if, for any codeword x C , given any y D t ( x ) , x can be uniquely recovered from y ; the code C Σ q n is said to be capable of correcting a burst of at most t deletions if, for any x C , given any y B t ( x ) , x can be uniquely recovered from y . More generally, let B { D t , B t } and N be a positive integer. A code C Σ q n is said to be an ( n , N , B ) -reconstruction code if, for any codeword x C , x can be uniquely recovered from any given N distinct sequences in B ( x ) . In this case, we also say that x can be uniquely recovered from N reads in B ( x ) . If N = 1 , then an ( n , N , B ) -reconstruction code degenerates to the classical error correcting code for the error pattern B .
For any code C Σ q n , the redundancy of C is defined as n log q | C | measured in q-ary symbols or log q ( n log q | C | ) measured in bits. Clearly, if there is an encoding function that maps each length-k sequence (message) to a length-n codeword in C , then the redundancy of C is n k . In this paper, we will always assume that q and t are fixed (i.e., q and t are constant with respect to n ) .
A convenient way for constructing deletion correcting codes is to construct some sketches such that for sufficiently many sequences, each can be recovered from its (known) sketches and one of its subsequence obtained by (a burst of) at most t deletions. The VT syndrome is a sketch for correcting a single deletion, which is defined as follows. For each c = ( c 1 , c 2 , , c n ) Σ 2 n , the VT syndrome of c is defined as
VT ( c ) = i = 1 n i c i mod ( n + 1 ) .
It was proved in [2] that for any c Σ 2 n , given VT ( c ) and any y D 1 ( c ) , one can uniquely recover x .
Suppose q > 2 . For each x = ( x 1 , x 2 , , x n ) Σ q n , let ϕ ( x ) = ( ϕ ( x ) 1 , ϕ ( x ) 2 , , ϕ ( x ) n ) Σ 2 n be such that, for each i [ 2 , n ] , ϕ ( x ) i = 1 if x i x i 1 and ϕ ( x ) i = 0 if x i < x i 1 . Moreover, let ϕ ( x ) 1 = 0 for all x Σ q n . ( one can also let ϕ ( x ) 1 = 1 for all x Σ q n ) . Then, there are q-ary codes constructed in [3] for correcting a single deletion.
Lemma 1.
[3] For any x Σ q n , given VT ( ϕ ( x ) [ 2 , n ] ) , Sum ( x ) and any y D 1 ( x ) , one can uniquely recover x , where ϕ ( x ) [ 2 , n ] = ( ϕ ( x ) 2 , , ϕ ( x ) n ) and
Sum ( x ) = i = 1 n x i mod q .

2.1. A Construction of q-ary Burst-Deletion Codes

For codes correcting burst deletions, the following lemma gives a q-ary version the construction in [33] to q-ary codes ( q > 2 ) , and will be used in our new construction.
Lemma 2.
For any positive integer m, there exists a function
h syn : Σ q m Σ q 4 log q m + o ( log q m ) ,
computable in time O ( q t m 3 ) , such that h syn ( x ) h syn ( x ) for any distinct x , x Σ q m with B t ( x ) B t ( x ) Ø .
Proof. 
The function h syn can be constructed by the syndrome compression technique developed in [33].
For each x Σ q m , let N t ( x ) be the set of all x Σ q m { x } , such that B t ( x ) B t ( x ) Ø . By simple counting, we have
| N t ( x ) | t m 2 q t .
We first construct a function h ¯ : Σ q m [ 0 , 2 R ¯ 1 ] satisfying: 1) R ¯ = t ( t + 1 ) 2 log m + log q ; and 2) h ¯ ( x ) h ¯ ( x ) for any x Σ q m and any x N t ( x ) . Specifically, h ¯ is constructed as follows: For each t [ t ] and j [ t ] , let
h ¯ t , j ( x ) = VT ( ϕ ( x I t , j ) [ 2 , m t , j ] ) , Sum ( x I t , j ) ,
where I t , j = { [ m ] : j mod t } and m t , j = | I t , j | . Then, let
h ¯ = h ¯ 1 , 1 , h ¯ 2 , 1 , h ¯ 2 , 2 , , h ¯ t , 1 , h ¯ t , 2 , , h ¯ t , t .
Clearly, | I t , j | m t and so, when represented as a binary sequence, the length | h ¯ ( x ) | of h ¯ ( x ) satisfies the following (throughout this paper, for any given q 2 , if needed, we will view a positive integer m as a q-ary sequence which is the q-base representation of m, and conversely, we also view a q-ary sequence z as a positive integer whose q-base representation is z ):
| h ¯ ( x ) | = t = 1 t j = 1 t h ¯ t , j ( x ) t = 1 t j = 1 t log m t + log q t ( t + 1 ) 2 log m + log q = R ¯ .
Hence, viewed as a positive integer, we have h ¯ ( x ) [ 0 , 2 R ¯ 1 ] for any x Σ q m . Moreover, for each t [ t ] , if y B t ( x ) , then we have y I t , j D 1 ( x I t , j ) for each j [ t ] , where I t , j = { [ n t ] : j ( mod t ) } . By Lemma 1, x I t , j can be recovered from h ¯ t , j ( x ) and y I t , j , and so x can be recovered from y and h ¯ ( x ) . Equivalently, for any x N t ( x ) , we have h ¯ ( x ) h ¯ ( x ) .
For each x Σ q m , let P ( x ) be the set of all positive integers j such that j is a divisor of | h ¯ ( x ) h ¯ ( x ) | for some x N t ( x ) . By the same discussions as in the proof of [33] (Lemma 4), we can obtain | P ( x ) | 2 log | N t ( x ) | + o ( log m ) O ( q t m 3 ) . (Note that q and t are assumed to fixed integers and, by (1), | N t ( x ) | t m 2 q t ) . So, by brute force search, one can find, in time 2 log | N t ( x ) | + o ( log m ) O ( q t m 3 ) , a positive integer α ( x ) 2 log | N t ( x ) | + o ( log m ) such that α ( x ) P ( x ) . Let h syn ( x ) = ( α ( x ) , h ¯ ( x ) mod α ( x ) ) . Then, we have h syn ( x ) h syn ( x ) for all x N t ( x ) . Equivalently, h syn ( x ) h syn ( x ) for any distinct x , x Σ q n with B t ( x ) B t ( x ) Ø .
Finally, since α ( x ) 2 log | N t ( x ) | + o ( log m ) is a positive integer, and by (1), | N t ( x ) | t m 2 q t , so viewed as a q-ary sequence, we have h syn ( x ) Σ q 4 log q m + o ( log q m ) .    □
Clearly, for any x Σ q m , given h syn ( x ) and any y B t ( x ) , one can uniquely recover x . This is because for any x x Σ q m such that y B t ( x ) , we have y B t ( x ) B t ( x ) , so B t ( x ) B t ( x ) Ø and by Lemma 2, we have h syn ( x ) h syn ( x ) . Thus, x is uniquely determined by h syn ( x ) and y .

2.2. Bounded Burst-Deletion Correction

We give a construction for correcting a single burst-deletion given the knowledge that the location of the deleted symbols are within an interval of length ρ .
Given a positive integer ρ , we define a collection of intervals
L ρ = { L j : j = 1 , 2 , , n / ρ 1 }
such that
L j = { [ ( j 1 ) ρ + 1 , ( j + 1 ) ρ ] , for j { 1 , , n / ρ 2 } , [ ( j 1 ) ρ + 1 , n ] , for j = n / ρ 1 .
The following remark is easy to see.
Remark 1.
The intervals in L ρ satisfy the following:
1) 
For any interval L [ n ] of length | L | ρ , there is a (not necessarily unique) j 0 [ n / ρ 1 ] = { 1 , 2 , , n / ρ 1 } , such that L L j 0 .
2) 
L j L j = Ø for all j , j [ 1 , n / ρ 1 ] , such that | j j | 2 .
Construction 1: Let h : Σ q m Σ q R m be a function for any positive integer m, where R m is a positive integer depending on m , q and t, such that h ( z ) h ( z ) for any distinct z , z Σ q m with B t ( z ) B t ( z ) Ø . Let L ρ = { L j : j = 1 , 2 , , n / ρ 1 } , such that each L j is defined by (2). For each x Σ q n and each { 0 , 1 } , let
h ˜ ρ ( ) ( x ) = j [ 1 , n / ρ 1 } : j mod 2 h ( x L j ) ( mod q R 2 ρ ) .
The modular operation ( mod q R 2 ρ ) in (3) is performed on the result of the summation, but not on each h ( x L j ) .
Lemma 3.
Suppose x x Σ q n . Suppose there exists an interval L [ n ] of length | L | ρ and two intervals D , D L of size | D | = | D | t , such that x [ n ] D = x [ n ] D . Then, we have h ˜ ρ ( ) ( x ) h ˜ ρ ( ) ( x ) for some { 0 , 1 } .
Proof. 
By (2), for each j { 1 , 2 , , n / ρ 1 } , the length | L j | of L j satisfies | L j | 2 ρ . So, for any x Σ q m , the length | h ( x L j ) | of h ( x L j ) (as a q-ary sequence) satisfies | h ( x L j ) | = R 2 | L j | R 2 ρ , which implies that (as an integer),
h ( x L j ) < q R 2 ρ .
Since | L | ρ , by 1) of Remark 1, there is a j 0 [ n / ρ 1 ] = { 1 , 2 , , n / ρ 1 } , such that L L j 0 . Let { 0 , 1 } be such that j 0 mod 2 , and let
Λ = { j [ 1 , n / ρ 1 ] : j j 0 mod 2 } .
Then, by 2) of Remark 1, L j L j 0 = Ø for all j Λ { j 0 } . Further, by assumption of D , D and L, we have x L j 0 D = x L j 0 D B t ( x L j 0 ) B t ( x L j 0 ) and x L j = x L j for all j Λ { j 0 } . Therefore, we have
h ( x L j 0 ) h ( x L j 0 )
and
h ( x L j ) = h ( x L j ) , j Λ { j 0 } .
By the above discussions, and by Construction 1, we can obtain that h ˜ ρ ( ) ( x ) h ˜ ρ ( ) ( x ) , which completes the proof.    □
Remark 2.
Let h be the function h s y n constructed in Lemma 2. Then, we have R 2 ρ = 4 log q ( 2 ρ ) + o ( log q ( 2 ρ ) ) . Moreover, by Lemma 2, h is computable in time O ( q t ( 2 ρ ) 3 ) .

3. Pattern Dense Sequences

The concept of ( p , δ ) -dense sequences was introduced in [18], and was used to construct binary codes with redundancy log n + t ( t + 1 ) 2 log log n + c t for correcting a burst of at most t deletions, where n is the message length and c t is a constant only depending on t. In this section, we generalize the ( p , δ ) -density to q-ary sequences and derive some important properties for these sequences that will be used in our new construction in the next section.
The q-ary ( p , δ ) -dense sequences can be defined similarly to the binary ( p , δ ) -dense sequences as follows.
Definition 1.
Let d δ n be three positive integers and p Σ q d called a pattern. A sequence x Σ q n is said to be ( p , δ ) -dense if each substring of x of length δ contains at least one p . The indicator vector of x with respect to p is a vector
1 p ( x ) = 1 p ( x ) 1 , 1 p ( x ) 2 , , 1 p ( x ) n Σ 2 n
such that, for each i [ n ] , 1 p ( x ) i = 1 if x [ i , i + d 1 ] = p , and 1 p ( x ) i = 0 otherwise.
In this section, we will always let ( d = 2 t )
p = 0 t 1 t
and view p = 0 t 1 t Σ q 2 t for any q 2 . Moreover, from Definition 1, we have the following simple remark.
Remark 3.
Each sequence x Σ q n can be written as the form x = x 0 p x 1 p x 2 p x m 1 p x m for some m 0 , such that each x i , i [ 0 , m ] , is a (possibly empty) string that does not contain p . Moreover, x is ( p , δ ) -dense if and only if it satisfies the following: (1) the lengths of x 0 and x m are not greater than δ 2 t ; and (2) the length of each x i , i [ 1 , m 1 ] , is not greater than δ + 1 4 t .
In [18], the VT syndrome of a p ( x ) was used to bound the location of deletions for ( p , δ ) -dense x , where a p ( x ) is a vector of length n p ( x ) + 1 , whose i-th entry is the distance between positions of the i-th and ( i + 1 ) -st 1 in the string ( 1 , 1 p ( x ) , 1 ) , and n p ( x ) is the number of 1s in 1 p ( x ) . In this paper, we prove that the VT syndrome of 1 p ( x ) plays the same role. Specifically, for each x Σ q n , let
a 0 ( x ) = i = 1 n 1 p ( x ) i
and
a 1 ( x ) = i = 1 n i · 1 p ( x ) i
where 1 p ( x ) is the indicator vector of x with respect to p , as defined in Definition 1. Then, we have the following lemma.
Lemma 4.
Suppose x Σ q n is ( p , δ ) -dense. For any t [ t ] and any y B t ( x ) , given a 0 ( x ) ( mod 4 ) and a 1 ( x ) ( mod 2 n ) , one can find, in time O ( n ) , an interval L [ n ] of length | L | 3 δ , such that y = x [ n ] D for some interval D L of size | D | = t = | x | | y | (in fact, we can require that the length of L is at most δ. However, the proof for | L | δ needs more careful discussions).
Proof. 
Let a 0 ( x ) = m and a 0 ( y ) = m . Then, by Remark 3, x and y can be written as the following form:
x = x 0 0 t 1 t x 1 0 t 1 t x 2 0 t 1 t x m 1 0 t 1 t x m
and
y = y 0 0 t 1 t y 1 0 t 1 t y 2 0 t 1 t y m 1 0 t 1 t y m
where x i and y j do not contain p = 0 t 1 t for each i [ 0 , m ] and j [ 0 , m ] . We denote
u i = | y 0 0 t 1 t y 1 0 t 1 t y i 1 0 t 1 t | , i [ 1 , m ]
and
v i = | y 0 0 t 1 t y 1 0 t 1 t y i | , i [ 0 , m ] .
Additionally, let u 0 = 0 . Clearly, for each i [ 0 , m ] , we have u i v i and y i = y [ u i + 1 , v i ] . Moreover, for each i [ 0 , m 1 ] , each j i [ u i , v i ] and j i + 1 [ u i + 1 , v i + 1 ] , we have
j i + 1 j i u i + 1 v i 2 t .
Note that a burst of t t deletions may destroy at most two ps or create at most one p , so Δ 0 m m { 1 , 0 , 1 , 2 } and Δ 0 can be computed from a 0 ( x ) a 0 ( y ) . We need to consider the following four cases according to Δ 0 .
Case 1: Δ 0 = 2 . Then, m = m 2 and there is an i d [ 0 , m ] such that | x i d + 1 | t 2 and y can be obtained from x by deleting a substring 1 t 1 x i d + 1 0 t 0 for some t 0 , t 1 > 0 , such that | x i d + 1 | + t 0 + t 1 = t . More specifically, y i d = x i d 0 t 1 t t 1 0 t t 0 1 t x i d + 2 . Clearly, we have 2 t t and x [ u i d + 1 , v i d + t ] = x i d 0 t 1 t x i d + 1 0 t 1 t x i d + 2 . It is sufficient to let L = [ u i d + 1 , v i d + t ] , but we still need to find i d .
Consider 1 p ( x ) and 1 p ( y ) . By Definition 1, 1 p ( x ) can be obtained from 1 p ( y ) by t insertions and two substitutions in the substring 1 p ( y ) [ u i d + 1 , v i d ] : inserting t 0s and substituting two 0s by two 1s. Then, by (5), we can obtain
a 1 ( x ) = a 1 ( y ) + λ 1 ( i d ) + λ 2 ( i d ) + ( m i d ) t
where λ 1 ( i d ) , λ 2 ( i d ) [ u i d + 1 , v i d + t ] are the locations of the two substitutions. To find i d , we define a function ξ 2 as follows: for every i [ 0 , m ] , let
ξ 2 ( i ) = a 1 ( y ) + 2 ( u i + 1 ) + ( m i ) t .
Then, for each i [ 0 , m 1 ] , we can obtain ξ 2 ( i + 1 ) ξ 2 ( i ) = 2 ( u i + 1 u i ) t 4 t t > 0 , where the first inequality comes from (6). So, for each i [ 0 , m 1 ] , we have
a 1 ( y ) < ξ 2 ( i ) < ξ 2 ( i + 1 ) ξ 2 ( m ) < a 1 ( y ) + 2 n ,
where the last inequality comes from the simple observation that ξ 2 ( m ) = a 1 ( y ) + 2 ( u m + 1 ) < a 1 ( y ) + 2 n .
By definition of ξ 2 and a 1 , we can obtain
ξ 2 ( i d + 1 ) a 1 ( x ) = 2 ( u i d + 1 + 1 ) λ 1 ( i d ) λ 2 ( i d ) t ( i ) 2 u i d + 1 + 2 2 ( v i d + t ) t ( ii ) 4 t + 2 3 t > 0
where (i) holds because λ 1 ( i d ) , λ 2 ( i d ) [ u i d + 1 , v i d + t ] , and (ii) is obtained from (6). On the other hand, by (7), a 1 ( x ) ξ 2 ( i d ) = λ 1 ( i d ) + λ 2 ( i d ) 2 ( u i d + 1 ) 0 (noticing that λ 1 ( i d ) , λ 2 ( i d ) [ u i d + 1 , v i d + t ] ) . Hence, we can obtain
ξ 2 ( i d ) a 1 ( x ) < ξ 2 ( i d + 1 ) .
By (8) and (9), i d and L can be found as follows: Compute
μ a 1 ( x ) ( mod 2 n ) a 1 ( y ) ( mod 2 n )
and
μ i ξ 2 ( i ) ( mod 2 n ) a 1 ( y ) ( mod 2 n )
for i from 0 to m . Then, we can find an i d [ 0 , m ] such that μ i d μ < μ i d + 1 , where μ m + 1 = 2 n . Let L = [ u i d + 1 , v i d + t ] . Note that x [ u i d + 1 , v i d + t ] = x i d 0 t 1 t x i d + 1 0 t 1 t x i d + 2 and x is ( p , δ ) -dense, so by Remark 3, the length of L satisfies | L | = | x i d 0 t 1 t x i d + 1 0 t 1 t x i d + 2 | 3 ( δ + 1 4 t ) + 4 t 3 δ , where the last inequality holds because 2 t t .
Case 2: Δ 0 = 1 . Then, m = m 1 and, similarly to Case 1, there is an i d [ 0 , m ] such that y i d can be obtained from x i d 0 t 1 t x i d + 1 by deleting t symbols and the pattern 0 t 1 t is destroyed. Clearly, x [ u i d + 1 , v i d + t ] = x i d 0 t 1 t x i d + 1 , and it is sufficient to let L = [ u i d + 1 , v i d + t ] . To find i d , consider 1 p ( y ) and 1 p ( x ) . By Definition 1, 1 p ( x ) can be obtained from 1 p ( y ) by t insertions and one substitution in the substring 1 p ( y ) [ u i d + 1 , v i d ] : inserting t 0s and substituting a 0 by a 1. By (5), we can obtain
a 1 ( x ) = a 1 ( y ) + λ ( i d ) + ( m i d ) t
where λ ( i d ) [ u i d + 1 , v i d + t ] is the location of the substitution. For every i [ 0 , m ] , let
ξ 1 ( i ) = a 1 ( y ) + ( u i + 1 ) + ( m i ) t .
Then, for each i [ 0 , m 1 ] , we have ξ 1 ( i + 1 ) ξ 1 ( i ) = u i + 1 u i t 2 t t > 0 , and so we can further obtain
a 1 ( y ) < ξ 1 ( i ) < ξ 1 ( i + 1 ) ξ 1 ( m ) a 1 ( y ) + n .
By definition of ξ 1 and a 1 , we can obtain ξ 1 ( i d + 1 ) a 1 ( x ) = u i d + 1 + 1 λ ( i d ) t > u i d + 1 + 1 ( v i + t ) t 2 t + 1 2 t > 0 . On the other hand, by (10), a 1 ( x ) ξ 1 ( i d ) = λ ( i d ) ( u i d + 1 ) 0 . Hence, we can obtain
ξ 1 ( i d ) a 1 ( x ) < ξ 1 ( i d + 1 ) .
By (11) and (12), L can be found as follows: Compute
μ a 1 ( x ) ( mod 2 n ) a 1 ( y ) ( mod 2 n )
and
μ i ξ 1 ( i ) ( mod 2 n ) a 1 ( y ) ( mod 2 n )
for i from 0 to m . Let i d [ 0 , m ] be such that μ i d μ < μ i d + 1 . Then, let L = [ u i d + 1 , v i d + t ] , where μ m + 1 = 2 n . Note that x [ u i d + 1 , v i d + t ] = x i d 0 t 1 t x i d + 1 and x is ( p , δ ) -dense, so by Remark 3, | L | = | x i d 0 t 1 t x i d + 1 | 2 ( δ + 1 4 t ) + 2 t < 2 δ .
Case 3: Δ 0 = 0 . Then, m = m . For every i [ 0 , m ] , let
ξ 0 ( i ) = a 1 ( y ) + ( m i ) t .
Note that x contains m copies of 0 t 1 t , so we have n 2 t m > m t . Therefore, for each i [ 0 , m 1 ] , we can obtain
a 1 ( y ) + n > a 1 ( y ) + m t ξ 0 ( i ) > ξ 0 ( i + 1 ) a 1 ( y ) .
As Δ 0 = 0 , there are two ways to obtain y from x :
1)
There is an i d [ 0 , m ] such that y i d can be obtained from x i d by a burst of t deletions. Correspondingly, by Definition 1, 1 p ( x ) can be obtained from 1 p ( y ) by inserting t 0s into 1 p ( y ) [ u i d + 1 , v i d ] . Therefore, we have
a 1 ( x ) = a 1 ( y ) + ( m i d ) t = ξ 0 ( i d ) .
2)
There is an i d [ 0 , m 1 ] such that x i d 0 t 1 t x i d + 1 = y i d 0 t + t 0 1 t + t 1 y i d + 1 for some t 0 , t 1 [ 1 , t 1 ] , such that t 0 + t 1 = t , and y i d 0 t 1 t y i d + 1 is obtained from x i d 0 t 1 t x i d + 1 by deleting the substring 0 t 0 1 t 1 . By Definition 1, 1 p ( x ) can be obtained from 1 p ( y ) by inserting t 0 0s in 1 p ( y ) [ u i d + 1 , v i d ] and t 1 0s in 1 p ( y ) [ v i d + 2 , v i d + 2 t ] . Therefore, we have
a 1 ( x ) = a 1 ( y ) + t 0 + ( m i d 1 ) t .
By definition of ξ 0 , we have ξ 0 ( i d ) a 1 ( x ) = t t 0 > 0 and a 1 ( x ) ξ 0 ( i d + 1 ) = t 0 > 0 . So, we can obtain
ξ 0 ( i d ) > a 1 ( x ) > ξ 0 ( i d + 1 )
For both cases, if i d [ 0 , m 1 ] , then we can L = [ u i d + 1 , v i d + 2 t + t ] ; if i d = m , then we can let L = [ u m + 1 , n ] . Note that x [ u i d + 1 , v i d + 2 t + t ] = x i d 0 t 1 t and x [ u m + 1 , n ] = x m , and since x is ( p , δ ) -dense, then by Remark 3, we have | L | = | x i d 0 t 1 t | 2 δ or | L | = | x m | 2 δ . Moreover, by (13), (14), and (15), i d (and so L ) can be found as follows: Compute
μ a 1 ( x ) ( mod 2 n ) a 1 ( y ) ( mod 2 n )
and
μ i ξ 0 ( i ) ( mod 2 n ) a 1 ( y ) ( mod 2 n )
for i from 0 to m. Then, we can always find an i d [ 0 , m ] , such that μ i d μ > μ i d + 1 , which is what we want.
Case 4: Δ 0 = 1 . Then, m = m + 1 and there is an i d [ 0 , m 1 ] such that x i d = y i d 0 t 0 s 0 t t 0 1 t y i d + 1 or x i d = y i d 0 t 1 t 1 s 1 t t 1 y i d + 1 , where t 0 [ 1 , t ] , t 1 [ 1 , t 1 ] and s Σ q t , and y can be obtained from x by deleting s . In this case, we can let L = [ v i d + 1 , v i d + 2 t + t ] , and can obtain | L | = 2 t + t < δ . To find i d , we consider 1 p ( x ) and 1 p ( y ) . By Definition 1, 1 p ( x ) can be obtained from 1 p ( y ) by inserting t 0s into 1 p ( y ) [ v i d + 1 , v i d + 2 t ] and substituting 1 p ( y ) v i d + 1 = 1 by a 0. Therefore, we have
a 1 ( x ) = a 1 ( y ) ( v i d + 1 ) + ( m 1 i d ) t .
For every i [ 0 , m 1 ] , let
ξ 1 ( i ) = a 1 ( y ) ( v i + 1 ) + ( m 1 i ) t .
Then, for each i [ 0 , m 2 ] , we have ξ 1 ( i ) ξ 1 ( i + 1 ) = v i + 1 v i t > 0 , where the inequality is obtained from (6). Moreover, we have ξ 1 ( 0 ) = a 1 ( y ) 1 + ( m 1 ) t < a 1 ( y ) + 2 t m < a 1 ( y ) + n and ξ 1 ( m 1 ) = a 1 ( y ) ( v m 1 + 1 ) > a 1 ( y ) n . So, for each i [ 0 , m 2 ] , we can obtain
a 1 ( y ) + n > ξ 1 ( i ) > ξ 1 ( i + 1 ) > a 1 ( y ) n .
By (16), and by the definition of ξ 1 , we have a 1 ( x ) = ξ 1 ( i d ) . So, by (17), i d (and so L ) can be found by the following process: For i from 0 to m 1 , compute ξ 1 ( i ) . Then, we can always find an i d [ 0 , m 1 ] such that ξ 1 ( i d ) ( mod 2 n ) = a 1 ( x ) ( mod 2 n ) , which is what we want.
Thus, one can always find the expected interval L [ n ] . From the above discussions, it is easy to see that the time complexity for finding such L is O ( n ) .    □
In the rest of this section, we will use the so-called sequence replacement (SR) technique to construct q-ary ( p , δ ) -dense strings with only one symbol of redundancy for δ = 2 t q 2 t log n . The SR technique, which has been widely used in the literature (e.g., see [19,34,35,36]), is an efficient method for constructing strings with or without some constraints on their substrings. In this paper, to apply the SR technique to construct ( p , δ ) -dense strings, each length- δ string that does not contain p needs to be compressed to a shorter sequence, which can be realized by the following lemma.
Lemma 5.
Let δ = 2 t q 2 t log n and S Σ q δ be the set of all sequences of length δ that do not contain p = 0 t 1 t . For n q 6 t + 3 log q e 0.4 , there exists an invertible function
g : S Σ q δ log q n 6 t 2
such that g and g 1 are computable in time O ( δ ) .
Proof. 
The proof follows the same idea of Proposition 1 of [19], and is replicated here for completeness.
As each s S has length δ = 2 t q 2 t log n and does not contain p , then S can be viewed as a subset of Σ q 2 t { p } q 2 t log n , and we have
log q | S | log q q 2 t 1 q 2 t log n = ( 2 t ) q 2 t log n + log n log q 1 1 q 2 t q 2 t ( i ) ( 2 t ) q 2 t log n + ( log n + 1 ) log q 1 e = δ log q n log e log q e δ 1.4 log q n log q e ( ii ) δ log q n 6 t 2 ,
where (i) comes from the fact that 1 1 x x < 1 e for x 1 , and (ii) holds when 0.4 log q n + log q e 6 t + 3 , i.e., n q 6 t + 3 log q e 0.4 . Thus, each sequence in S can be represented by a q-ary sequence of length δ log q n 6 t 2 , which gives an invertible function g : S Σ q δ log q n 6 t 2 .
The computation of g and g 1 involve conversion of integers in [ 0 , q 2 t 1 q 2 t log n 1 ] between ( q 2 t 1 ) -base representation and q-base representation, so have time complexity O ( 2 t q 2 t log n ) = O ( δ ) .    □
In the rest of this section, we will always let
δ = 2 t q 2 t log n .
As we are interested in large n, we will always assume that n q 6 t + 3 log q e 0.4 . The following lemma gives a function for encoding q-ary strings to ( p , δ ) -dense strings.
Lemma 6.
There exists an invertible function, denoted by EncDen : Σ q n 1 Σ q n , such that for every u Σ q n 1 , x = EncDen ( u ) is ( p , δ ) -dense. Both EncDen and its inverse, denoted by DecDen , are computable in O ( n log n ) time.
Proof. 
Let g be the function constructed in Lemma 5. The functions EncDen and DecDen are described by Algorithms 1 and 2, respectively, where each integer i [ n ] is also viewed as a q-ary string of length log n which is the q-base representation of i.
The correctness of Algorithm 1 can be proved as follows:
1)
In the initialization step, if u ˜ = u [ n δ + 2 t , n 1 ] contains p then, clearly, x has length n. If u ˜ = u [ n δ + 2 t , n 1 ] does not contain p , then
x = ( u [ 1 , n ] , p , p , g ( ( u ˜ , 0 2 t ) ) , 0 log q n + 3 )
and so | x | = n + 4 t + | g ( ( u ˜ , 0 2 t ) ) | + log q n + 3 = n , where n = n δ + 2 t 1 , and by Lemma 5, | g ( ( u ˜ , 0 2 t ) ) | = δ log q n 6 t 2 . So, at the end of the initialization step, x has length n. Moreover, x [ n + 1 , n + 2 t ] = p , and the substring x [ n + 2 t + 1 , n ] has length δ 4 t + 1 .
2)
In each round of the replacement step, if x ˜ x [ i , i + δ 1 ] does not contain p for some i [ 1 , n δ + 1 ] , then by Lemma 5, | ( p , p , i , g ( x ˜ ) , 0 , 1 2 t , 0 ) | = δ = | x [ i , i + δ 1 ] | , so by replacement, the length of the appended string equals to the length of the deleted substring, and hence the length of x keeps unchanged.
3)
At the beginning of each round of the replacement step, we have x [ n + 1 , n + 2 t ] = p , so for i [ n + 2 t δ + 1 , n ] , the substring x [ i , i + δ 1 ] contains p . Equivalently, if x ˜ x [ i , i + δ 1 ] does not contain p for some i [ n δ + 2 , n ] , then it must be that i [ n δ + 2 , n + 2 t δ ] . In this case, | ( p , p , i , g ( ( x [ i , n ] , 0 ) ) , 0 , 1 2 t , 0 ) | = δ = | x [ i , n ] | , so by replacement, the length of the appended string equals to the length of the deleted substring, and hence the length of x keeps unchanged.
4)
By 1), 2) and 3), the substring x [ n + 1 , n δ + 1 ] is always of the form p u p p v p p w , where all substrings u , v , , w have a length not greater than δ + 1 4 t . So, by Remark 3, for each i [ n + 1 , n δ + 1 ] , the substring x [ i , i + δ 1 ] contains p .
5)
At the end of each round of the replacement step, the value of n strictly decreases, so the While loop will end after at most n rounds, and at this time, for each i [ 1 , n ] , the substring x [ i , i + δ 1 ] contains p , which combining with 4) implies that x is ( p , δ ) -dense.
The correctness of Algorithm 2 can be easily seen from Algorithm 1, so DecDen is the inverse of EncDen .
Note that Algorithms 1 and 2 have at most n rounds of replacement, and in each round, g (resp. g 1 ) needs to be computed, which has time complexity O ( δ ) = O ( log n ) by Lemma 5, so the total time complexity of Algorithms 1 and 2 is O ( n log n ) .    □
Algorithm 1: The function EncDen for encoding to ( p , δ ) -dense sequence
1: 
Input:    u Σ q n 1
2: 
Output:  x = EncDen ( u ) Σ q n such that x is ( p , δ ) -dense
3: 
(Initialization Step:)
4: 
Let  u ˜ = u [ n δ + 2 t , n 1 ] .
5: 
if  u ˜ contains p  then
6: 
     let n be the smallest i [ n δ + 2 t 1 , n 2 ] such that u [ i + 1 , i + 2 t ] = p , and let x = ( u , 1 ) ;
7: 
else
8: 
     let n = n δ + 2 t 1 and x = ( u [ 1 , n ] , p , p , g ( ( u ˜ , 0 2 t ) ) , 0 log q n + 3 ) .
9: 
end if
10: 
(Replacement Step:)
11: 
while there exists an i [ 1 , n ] such that x ˜ x [ i , i + δ 1 ] does not contain p  do
12: 
    if  i [ 1 , n δ + 1 ]  then
13: 
         delete x [ i , i + δ 1 ] from x and append ( p , p , i , g ( x ˜ ) , 0 , 1 2 t , 0 ) to x ; let n = n δ .
14: 
    end if
15: 
    if  i [ n δ + 2 , n ]  then
16: 
         delete x [ i , n ] from x and append ( p , p , i , g ( ( x [ i , n ] , 0 ) ) , 0 , 1 2 t , 0 ) to x , where δ | x [ i , n ] | satisfying 1 2 t 1 ; let n = i 1 .
17: 
    end if
18: 
end while
19: 
Return  x = EncDen ( u ) .
Algorithm 2: The function DecDen for decoding of ( p , δ ) -dense sequence
1: 
Input:   x = EncDen ( u ) Σ q n
2: 
Output:  u Σ q n 1
3: 
while  x [ n 2 , n ] = 01 0 for some [ 1 , 2 t ]  do
4: 
     let u ˜ be obtained from g 1 ( x [ n δ + 6 t + 1 + log q n , n 2 ] ) by deleting the last 2 t symbols; delete the last δ + 2 t symbols of x and insert u ˜ at the position i of x such that i = x [ n δ + 6 t + 1 , n δ + 6 t + log n ] .
5: 
end while
6: 
if  x n = x n 1 = 0   then
7: 
     let u ˜ be obtained from g 1 ( x [ n δ + 6 t , n log q n 3 ] ) by deleting the last 2 t 0s and let u = ( x [ 1 , n δ + 2 t 1 ] , u ˜ ) .
8: 
end if
9: 
if  x n = 1   then
10: 
    let u = x [ 1 , n 1 ] .
11: 
end if
12: 
Return  u = DecDen ( x ) .
The Algorithm 1 is obtained by modifying the Algorithm 2 of [19], which is for binary sequences but for our purpose we need apply it to q-ary sequences.

4. Burst-Deletion Correcting q -ary Codes

In this section, we still let
δ = 2 t q 2 t log n .
Based on ( p , δ ) -dense sequences, for any fixed positive integers t and q 2 , we will construct a family of q-ary codes that can correct a burst of at most t deletions. The basic idea of our construction is as follows: For each ( p , δ ) -dense sequence x , use the sketches a 0 ( x ) and a 1 ( x ) (defined by (4) and (5), respectively) to locate the deletions within an interval of length 3 δ . Then, use the functions h ˜ ρ ( 0 ) ( x ) , h ˜ ρ ( 1 ) ( x ) constructed in Construction 1 to uniquely recover x .
Let h : Σ q m Σ q R m be a sketch function for any positive integer m, where R m is a positive integer depending on m , q and t, such that h ( z ) h ( z ) for any distinct z , z Σ q m with B t ( z ) B t ( z ) Ø . For each x Σ q n , let
f ( x ) = a 0 ( x ) ( mod 4 ) , a 1 ( x ) ( mod 2 n ) , h ˜ ρ ( 0 ) ( x ) , h ˜ ρ ( 1 ) ( x )
such that a 0 ( x ) is defined by (4), a 1 ( x ) is defined by (5), h ˜ ρ ( 0 ) ( x ) and h ˜ ρ ( 1 ) ( x ) are obtained by Construction 1 with ρ = 3 δ = 6 t q 2 t log n .
Lemma 7.
Let f be the function given by (18). If x Σ q n is ( p , δ ) -dense, then given f ( x ) and any y B t ( x ) , one can uniquely recover x .
Proof. 
First, since x is ( p , δ ) -dense, then by Lemma 4, given a 0 ( x ) ( mod 4 ) and a 1 ( x ) ( mod 2 n ) , one can find an interval L of length | L | 3 δ = ρ , such that y = x [ n ] D for some interval D L of size t = n | y | . Moreover, suppose x Σ q n , such that y = x [ n ] D for some interval D L of size t = n | y | . Then, by Lemma 3, we have h ˜ ρ ( ) ( x ) h ˜ ρ ( ) ( x ) for some { 0 , 1 } . Thus, given f ( x ) = ( a 0 ( x ) ( mod 4 ) , a 1 ( x ) ( mod 2 n ) , h ˜ ρ ( 0 ) ( x ) , h ˜ ρ ( 1 ) ( x ) ) and any y B t ( x ) , one can uniquely recover x . □
Using the function f, we can give an encoding function of a q-ary code that can correct a burst of at most t deletions.
Lemma 8.
Let f be the function given by (18) and f q ( x ) be the q-ary representation of f ( x ) . Let
E : Σ q n 1 Σ q n + r u x , 0 t 1 , f q ( x )
such that x = EncDen ( u ) and r = t + 1 + | f q ( x ) | . Then, for each z = E ( u ) , given any y B t ( z ) , one can uniquely recover x (and so z ) .
Proof. 
Let t = | z | | y | . Suppose D = [ i d , i d + t 1 ] [ 1 , n + r ] is an interval such that y = z [ n + r ] D . If there is more than one interval D such that y = z [ n + r ] D , then we consider the D with the smallest i d . Clearly, we have i d [ 1 , n + r t + 1 ] . If i d [ 1 , n + t + 1 t ] , then y n + t + 1 t = z n + t + 1 = 1 ; if i d [ n + t + 2 t , n + r t + 1 ] , then y n + t + 1 t = z n + t + 1 t = 0 . So, we have the following two cases.
Case 1: y n + t + 1 t = 1 . Then, i d [ 1 , n + t t + 1 ] . We need further to consider the following three subcases.
Case 1.1: y [ n + 1 t , n + 1 + t t ] = 0 t 1 . In this case, it must be that D [ 1 , n ] . Therefore, we have y [ 1 , n t ] B t ( x ) and y [ n + t + 2 t , n + r t ] = f q ( x ) . By Lemma 7, x can be recovered from y [ 1 , n t ] and y [ n + t + 2 t , n + r t ] correctly.
Case 1.2: There is a t [ 1 , t 1 ] such that y [ n + 1 t + t , n + 1 + t t ] = 0 t t 1 and y n t + t 0 . In this case, it must be that D = [ n + 1 t + t , n + t ] . Therefore, y [ 1 , n + 1 t + t ] = x [ 1 , n + 1 t + t ] B t t ( x ) and y [ n + t + 2 t , n + r t ] = f q ( x ) . By Lemma 7, x can be recovered from y [ 1 , n + 1 t + t ] and y [ n + t + 2 t , n + r t ] correctly.
Case 1.3: y [ n + 1 , n + 1 + t t ] = 0 t t 1 and y n 0 . In this case, it must be that D [ n + 1 , n + t ] . Therefore, y [ 1 , n ] = x .
Case 2: y n + t + 1 t = 0 . Then, we have i d [ n + t + 2 t , n + r t + 1 ] and x = y [ 1 , n ] .
Thus, one can always uniquely recover x from y . □
Theorem 1.
There exists a q-ary code of length n capable of correcting a burst of at most t deletions, which has redundancy log n + 8 log log n + o ( log log n ) bits, encoding complexity O ( q 7 t n ( log n ) 3 ) and decoding complexity O ( n log n ) .
Proof. 
In Construction 1, let h be the function h syn given by Lemma 2 and let C syn be the code with the encoding function E constructed in Lemma 8. Then, C syn Σ q n is a code capable of correcting a burst of at most t deletions.
The redundancy of C syn is 1 + r = 1 + t + 1 + | f q ( x ) | = t + 2 + | f q ( x ) | in q-ary symbols. We now evaluate | f q ( x ) | . By Lemma 2, we have R 2 ρ = 4 log q ( 2 ρ ) + o ( log q ( 2 ρ ) ) . Moreover, since ρ = 6 t q 2 t log n , then for { 0 , 1 } , by (3) and Remark 2, the length | h ˜ ρ ( ) ( x ) | of h ˜ ρ ( ) ( x ) (as a q-ary string) satisfies
| h ˜ ρ ( ) ( x ) | < R 2 ρ = 4 log q ( 12 t q 2 t log n ) + o ( log q ( 12 t q 2 t log n ) ) = 4 log q log n + o ( log q log n ) .
So, by (4) and (5), the length | f q ( x ) | of f q ( x ) (as a q-ary string) satisfies
| f q ( x ) | = log q 4 + log q ( 2 n ) + | h ˜ ρ ( 0 ) ( x ) | + | h ˜ ρ ( 1 ) ( x ) | log q n + 8 log q log n + o ( log q log n ) .
Thus, the redundancy of C syn , measured in bits, is ( t + 2 + | f q ( x ) | ) log q log n + 8 log log n + o ( log log n ) .
Consider the encoding complexity of C syn . Note that, by (3) and Remark 2, h ˜ ρ ( 0 ) ( x ) and h ˜ ρ ( 1 ) ( x ) are computable in time O ( n q t ( 2 ρ ) 3 ) = O ( n q t ( 12 t q 2 t log n ) 3 ) = O ( q 7 t n ( log n ) 3 ) . Then, by (4), (5), and (18), f ( x ) is computable in time O ( q 7 t n ( log n ) 3 ) . Moreover, by Lemma 6, the mapping EncDen is computable in O ( n log n ) time, so by (19), the encoding complexity of C syn is O ( q 7 t n ( log n ) 3 ) .
For the decoding complexity of C syn , by Lemma 6, the inverse of EncDen is computable in O ( n log n ) time, so by the proof of Lemma 8, we only need to consider the complexity of recovering x from f ( x ) and any given y B t ( x ) . By Lemma 4 and 1) of Remark 1, one can locate the deletions within an interval L j 0 in time O ( n ) using a 0 ( x ) ( mod 4 ) and a 1 ( x ) ( mod 2 n ) . Then, x can be recovered by brute force searching in time O ( ( 2 ρ q t ) ( q t ( 2 ρ ) 3 ) ) = O ( q 8 t ( log n ) 4 ) . In fact, there are | L j 0 | · q t candidate sequences for x , and for each candidate sequence x , we need to verify whether h syn ( x L j 0 ) = h syn ( x L j 0 ) , which takes time O ( q t | L j 0 | 3 ) by Lemma 2. Hence, the time of brute force searching is
O ( | L j 0 | · q t ) ( q t | L j 0 | 3 ) O ( 2 ρ q t ) ( q t ( 2 ρ ) 3 ) ,
where the inequality holds because | L j 0 | 2 ρ . Thus, the decoding complexity of C syn is O ( n log n ) . □

5. Correcting Burst-Deletion with Two Reads

In this section, we construct a family of codes correcting a burst of at most t deletions with 2 reads. Our construction improves the construction in [31] in redundancy. We first recall the concept of period of a sequence.
Suppose T m are two positive integers and x Σ q m . We say that T is a period of x if x i = x i + T for any i [ 1 , m T ] .
The following two simple observations are easy to see and will be used in our construction.
Observation 1: Let x , x Σ q n and t [ t ] . Suppose D = [ i d , i d + t 1 ] , D = [ i d , i d + t 1 ] [ n ] such that i d i d . Then, x [ n ] D = x [ n ] D if and only if the following holds:
x i = x i , for i [ 1 , i d 1 ] , x i + t , for i [ i d , i d 1 ] , x i , for i [ i d + t , n ] .
Observation 2: Let x Σ q n and t [ t ] . Suppose J [ n ] is an interval such that | J | t and the substring x J of x has period t . Then, for any D = [ i d , i d + t 1 ] J and D = [ i d , i d + t 1 ] J , we have x [ n ] D = x [ n ] D .
Our construction will use ( p , δ ) -dense sequences for p = 0 t + 1 1 t + 1 . To apply Lemma 6 to construct ( p , δ ) -dense sequences, we need to let δ = 2 ( t + 1 ) q 2 ( t + 1 ) log n . Then, by Lemma 6, there exists an invertible function EncDen : Σ q n 1 Σ q n , such that x = EncDen ( u ) is ( p , δ ) -dense for every u Σ q n 1 . Moreover, both EncDen and its inverse DecDen are computable in O ( n log n ) time. Thus, in this section, we always assume that
p = 0 t + 1 1 t + 1 and δ = 2 ( t + 1 ) q 2 ( t + 1 ) log n .
Suppose x Σ q n is ( p , δ ) -dense and J [ n ] is an interval of length | J | δ . By Definition 1, there exists an i 0 J such that J 0 [ i 0 , i 0 + 2 ( t + 1 ) 1 ] J and x J 0 = p = 0 t + 1 1 t + 1 . For any positive integer T 2 t , let i 0 = i 0 + t + 1 T if T t , and let i 0 = i 0 if T t + 1 . Then, [ i 0 , i 0 + T ] J 0 J and x i 0 = 0 1 = x i 0 + T . Thus, T is not a period of x J . In other words, we have the following remark.
Remark 4.
Suppose x Σ q n is ( p , δ ) -dense, where p = 0 t + 1 1 t + 1 and δ = 2 ( t + 1 ) q 2 ( t + 1 ) log n . Then, the length of any substring of x of period T 2 t is at most δ.
Lemma 9.
Suppose x x Σ q n are both ( p , δ ) -dense. If | B t ( x ) B t ( x ) | 2 , then there exists an interval J [ n ] of length | J | δ + t and two intervals D , D J of size | D | = | D | t , such that x [ n ] D = x [ n ] D .
Proof. 
Suppose y , y B t ( x ) B t ( x ) and y y . Then, y = x [ n ] D 1 = x [ n ] D 1 for some intervals D 1 , D 1 [ n ] of size t 1 | D 1 | = | D 1 | t , and y = x [ n ] D 2 = x [ n ] D 2 for some intervals D 2 , D 2 [ n ] of size t 2 | D 2 | = | D 2 | t . For convenience in the discussions, we denote
D j = [ i j , i j + t j 1 ] and D j = [ i j , i j + t j 1 ] , j = 1 , 2 .
If | i j i j | δ for some j { 1 , 2 } , then let D = D j , D = D j and J = [ λ , λ + t j 1 ] such that λ = min { i j , i j } and λ = max { i j , i j } . Clearly, J, D, and D satisfy the desired conditions. So, in the following, we assume that | i j i j | > δ for each j { 1 , 2 } .
By the symmetry of x and x ( y and y ) , we can assume
i 1 < i 1 and t 2 t 1 .
Since y = x [ n ] D 1 = x [ n ] D 1 , where D 1 = [ i 1 , i 1 + t 1 1 ] , D 1 = [ i 1 , i 1 + t 1 1 ] and i 1 < i 1 , we can obtain from Observation 1 that
x i = y i = x i , for i [ 1 , i 1 1 ] , y i = x i + t 1 , for i [ i 1 , i 1 1 ] , y i t 1 = x i , for i [ i 1 + t 1 , n ] .
Similarly, since y = x [ n ] D 2 = x [ n ] D 2 , where D 2 = [ i 2 , i 2 + t 2 1 ] and D 2 = [ i 2 , i 2 + t 2 1 ] , we have
  • If i 2 < i 2 , then according to Observation 1,
    x i = y i = x i , for i [ 1 , i 2 1 ] , y i = x i + t 2 , for i [ i 2 , i 2 1 ] , y i t 2 = x i , for i [ i 2 + t 2 , n ] .
  • If i 2 > i 2 , then according to Observation 1,
    x i = y i = x i , for i [ 1 , i 2 1 ] , y i t 2 = x i t 2 , for i [ i 2 + t 2 , i 2 + t 2 1 ] , y i = x i , for i [ i 2 + t 2 , n ] .
Let
I 1 = [ i 1 , i 1 + t 1 1 ] and I 2 = [ i ¯ 2 , i ¯ 2 + t 2 1 ] ,
where i ¯ 2 = min { i 2 , i 2 } and i ¯ 2 = max { i 2 , i 2 } . Then, by (20), (21), and (22), we can easily obtain the following claim.
Claim 1: For all i [ n ] ( I 1 I 2 ) , we have x i = x i .
Since x x , by Claim 1, we have I 1 I 2 Ø . If | I 1 I 2 | t then, clearly, J = D = D = I 1 I 2 satisfies the desired conditions. So, in the following, we only need to consider | I 1 I 2 | > t . We have the following two cases.
Case 1: i 2 < i 2 . Then, min { i 2 , i 2 } = i 2 and max { i 2 , i 2 } = i 2 , and so I 2 = [ i 2 , i 2 + t 2 1 ] and I 1 I 2 = [ λ , λ ] , where λ = max { i 1 , i 2 } and λ = min { i 1 + t 1 1 , i 2 + t 2 1 } . We need to further divide this case into the following two subcases.
Case 1.1: t 2 < t 1 . Let D = [ λ , λ + t 2 1 ] and D = [ λ t 2 + 1 , λ ] . Then, D , D J and | D | = | D | t . Note that by Claim 1, x i = x i for every i [ n ] ( I 1 I 2 ) = [ n ] [ λ , λ ] , and by (20), x i = x i + t 2 for every i [ λ , λ t 2 ] . So, according to Observation 1, we can obtain
x [ n ] D = x [ n ] D .
Moreover, by (20) and (21), x i = x i t 2 = x i t 2 + t 1 for every i [ λ + t 2 , λ t 1 + t 2 ] . So, x [ λ + t 2 , λ ] has period t 1 t 2 , and by Remark 4, | [ λ + t 2 , λ ] | δ . Hence, we have
| J | = | I 1 I 2 | = | [ λ , λ ] | δ + t 2 δ + t .
Case 1.2: t 2 = t 1 . We will prove that this case is impossible by contradiction. Without loss of generality, assume i 1 i 2 . Since | I 1 I 2 | > t , then I 1 I 2 = [ i 2 , i ˜ 1 ] and i 2 i ˜ 1 t , where i ˜ 1 = min { i 1 , i 2 } . By (20) and (21), we have
x i = x i = x i + t 1
for every i [ i 1 , i 2 1 ] , i.e., x [ i 1 , i 2 + t 1 1 ] has period t 1 . So, we can obtain y = x [ n ] [ i 1 , i 1 + t 1 1 ] = x [ n ] [ i 2 , i 2 + t 1 1 ] = y (according to Observation 2), which contradicts to the assumption that y y .
Case 2: i 2 > i 2 . In this case, I 2 = [ i 2 , i 2 + t 2 1 ] and we have I 1 I 2 = [ λ , λ ] , where λ = max { i 1 , i 2 } and λ = min { i 1 + t 1 1 , i 2 + t 2 1 } . Let D = [ λ , λ + t 1 1 ] and D = [ λ t 1 + 1 , λ ] . Then, D , D J and | D | = | D | t . Note that by Claim 1, x i = x i for every i [ n ] ( I 1 I 2 ) = [ n ] [ λ , λ ] , and by (20), x i = x i + t 1 for every i [ λ , λ t 1 ] . So, according to Observation 1, we can obtain
x [ n ] D = x [ n ] D .
Moreover, by (20) and (22), we can obtain
x i = x i + t 2 = x i + t 2 + t 1
for every i [ λ , λ t 1 t 2 ] . Hence, x [ λ , λ ] is a substring of x of period t 1 + t 2 . By Remark 4, | [ λ , λ ] | δ , so | J | = | I 1 I 2 | = | [ λ , λ ] | δ δ + t . □
For n = 30 , we give three examples of y and y , satisfying conditions of the different cases in the proof of Lemma 9.
Example 1.
Suppose y = x [ n ] D 1 = x [ n ] D 1 and y = x [ n ] D 2 = x [ n ] D 2 , where D 1 = { 4 , 5 , 6 , 7 } , D 1 = { 27 , 28 , 29 , 30 } , D 2 = { 1 , 2 } and D 2 = { 23 , 24 } . Here, we have i 1 = 4 , i 1 = 27 and t 1 = 4 ; i 2 = 1 , i 2 = 23 and t 2 = 2 . Then, conditions of Case 1.1 are satisfied, and J = I 1 I 2 = [ 4 , 24 ] . Figure 1a is an illustration of this case. We can easily find that x i = x i + 2 for every i [ 4 , 22 ] and x i = x i for every i [ n ] J . So, by Observation 1, x [ n ] { 4 , 5 } = x [ n ] { 23 , 24 } . Moreover, x i = x i 2 = x i + 2 for every i [ 6 , 22 ] , so the substring x [ 6 , 24 ] of x has period 2.
Example 2.
Suppose y = x [ n ] D 1 = x [ n ] D 1 and y = x [ n ] D 2 = x [ n ] D 2 , where D 1 = { 1 , 2 , 3 , 4 } , D 1 = { 17 , 18 , 19 , 20 } , D 2 = { 9 , 10 , 11 , 12 } and D 2 = { 27 , 28 , 29 , 30 } . Here, we have i 1 = 1 , i 1 = 17 , i 2 = 9 , i 2 = 27 and t 1 = t 2 = 4 . Then, conditions of Case 1.2 are satisfied. Figure 1b is an illustration of this case. We can find that x i = x i = x i + 4 for every i [ 1 , 8 ] , i.e., x [ 1 , 12 ] has period 4. So, by Observation 2, we have y = x [ n ] D 1 = x [ n ] D 2 = y .
Example 3.
Suppose y = x [ n ] D 1 = x [ n ] D 1 and y = x [ n ] D 2 = x [ n ] D 2 , where D 1 = { 7 , 8 , 9 , 10 } , D 1 = { 27 , 28 , 29 , 30 } , D 2 = { 24 , 25 , 26 } and D 2 = { 1 , 2 , 3 } . Here, we have i 1 = 7 , i 1 = 27 and t 1 = 4 ; i 2 = 24 , i 2 = 1 and t 2 = 3 . Then, conditions of Case 2 are satisfied and J = I 1 I 2 = [ 7 , 26 ] . Figure 1c is an illustration of this case. We can easily find that x i = x i + 4 for every i [ 7 , 22 ] , and x i = x i for every i [ n ] J . So by Observation 1, x [ n ] { 7 , 8 , 9 , 10 } = x [ n ] { 23 , 24 , 25 , 26 } . Moreover, x i = x i + 3 = x i + 7 for every i [ 7 , 19 ] , i.e., x [ 7 , 26 ] has period 7.
In the following, we give a construction of q-ary codes for correcting a burst of at most t deletions with two reads. Let
ρ = δ + t = 2 ( t + 1 ) q 2 ( t + 1 ) log n + t
and L ρ = { L j : j = 1 , 2 , , n / ρ 1 } be defined by (2). For each { 0 , 1 } , let the function h ˜ ρ ( ) be obtained from Construction 1 by letting h be the function h syn given by Lemma 2. Then, let
f ˜ ( x ) = h ˜ ρ ( 0 ) ( x ) , h ˜ ρ ( 1 ) ( x ) .
Lemma 10.
For each x Σ q n , the function f ˜ ( x ) is computable in time O ( q 7 t n ( log n ) 3 ) , and when viewed as a binary string, the length | f ˜ ( x ) | of f ˜ ( x ) satisfies
| f ˜ ( x ) | 8 log log n + o ( log log n ) .
Moreover, if x is ( p , δ ) -dense, then given f ˜ ( x ) and any two distinct y , y B t ( x ) , one can uniquely recover x .
Proof. 
Note that | L j | 2 ρ = 4 ( t + 1 ) q 2 ( t + 1 ) log n + 2 t for each j. Then, by (3) and by Remark 2, the functions h ˜ ρ ( 0 ) ( x ) and h ˜ ρ ( 1 ) ( x ) are computable in time
O ( n q t ( 2 ρ ) 3 ) = O ( q 7 t n ( log n ) 3 ) .
Hence, by (23), f ˜ ( x ) is computable in time O ( q 7 t n ( log n ) 3 ) .
Again by (3) and by Remark 2, the length of f ˜ ( x ) (viewed as a binary string) satisfies
| f ˜ ( x ) | =   | h ˜ ρ ( 0 ) ( x ) | + | h ˜ ρ ( 1 ) ( x ) | = 2 log q R 2 ρ = 2 log q 4 log q ( 2 ρ ) + o ( log q ( 2 ρ ) ) = 8 log log n + o ( log log n ) ,
where the last equality comes from the assumption that ρ = δ + t = 2 ( t + 1 ) q 2 ( t + 1 ) log n + t .
Finally, we prove that if x Σ q n is ( p , δ ) -dense, then given f ˜ ( x ) and any two distinct y , y B t ( x ) , one can uniquely recover x . It suffices to prove that for any ( p , δ ) -dense x , x Σ q n , if | B t ( x ) B t ( x ) | 2 , then f ˜ ( x ) f ˜ ( x ) . This can be proved as follows. By Lemma 9, there exists an interval J [ n ] of length | J | δ + t = ρ and two intervals D , D J of size | D | = | D | t , such that x [ n ] D = x [ n ] D . Then, by Lemma 3, we have h ˜ ρ ( ) ( x ) h ˜ ρ ( ) ( x ) for some { 0 , 1 } . Therefore, by (23), we have f ˜ ( x ) f ˜ ( x ) . □
Theorem 2.
Let C ˜ syn be the code with the encoding function
E ˜ : Σ q n 1 Σ q n + r u x , w
where x = EncDen ( u ) , w = 0 t 1 , x [ n t + 1 , n ] , f ˜ q ( x ) , and r = | w | = 2 t   +   1   +   | f ˜ q ( x ) | . Then, C ˜ syn is an ( n , 2 , B t ) -reconstruction code with redundancy 8 log log n + o ( log log n ) bits, encoding complexity O q 7 t n ( log n ) 3 and decoding complexity O q 9 t ( n log n ) 3 .
Proof. 
We first prove that C ˜ syn is an ( n , 2 , B t ) -reconstruction code. We need to prove that for each codeword z = E ˜ ( u ) = x , 0 t 1 , x [ n t + 1 , n ] , f ˜ q ( x ) C ˜ syn , given any distinct y , y B t ( z ) , one can uniquely recover x .
Let y = z [ n + r ] D , where D [ n + r ] is an interval of size t = | D | = | z | | y | . By the same discussions as in the proof of Lemma 8, exactly one of the following three cases holds:
Case 1: D [ n + 1 , n + r ] . In this case, we have x = y [ 1 , n ] .
Case 2: There exists a t [ t 1 ] such that D [ n t + t + 1 , n + t ] . In this case, y [ 1 , n + 1 t + t ] = x [ 1 , n + 1 t + t ] and y [ n + t + 2 t , n + 2 t + 1 t ] = z [ n + t + 2 , n + 2 t + 1 ] = x [ n t + 1 , n ] . So, x = ( y [ 1 , n + 1 t ] , y [ n + t + 2 t , n + 2 t + 1 t ] ) .
Case 3: D [ n ] . In this case, we have y [ n t ] B t ( x ) and y [ n + 2 t + 1 t , n + r t ] = f ˜ q ( x ) .
Similarly, for y B t ( z ) , either x can be recovered from y or y = z [ n + r ] D for some interval D [ n ] of size | D | = | z | | y | t . Suppose y = z [ n + r ] D and y = z [ n + r ] D for some intervals D , D [ n ] . Then, y [ n | D | ] , y [ n | D | ] B t ( x ) and y [ n + 2 t + 1 | D | , n + r | D | ] = f ˜ q ( x ) . Moreover, we must have y [ n | D | ] y [ n | D | ] , because otherwise, from (24), we will obtain y = ( y [ n | D | ] , w ) = ( y [ n | D | ] , w ) = y , which contradicts to the assumption that y y . By Lemma 10, x can be uniquely recovered from y and y .
Thus, we proved that, given any distinct y , y B t ( z ) , one can uniquely recover x (and so z ) , which implies that C ˜ syn is an ( n , 2 , B t ) -reconstruction code.
By (24), the redundancy of C ˜ syn is r + 1 = t + 2 + | f ˜ q ( x ) | in q-ary symbols, which is at most 8 log log n + o ( log log n ) bits by Lemma 10.
The encoding complexity of C ˜ syn comes from the complexity of computing f ˜ q ( x ) , which is O q 7 t n ( log n ) 3 by Lemma 10. The decoding complexity of C ˜ syn by brute force searching is at most O ( n q t ) 2 q 7 t n ( log n ) 3 = O q 9 t ( n log n ) 3 . □

6. Conclusions and Discussions

We proposed new constructions of q-ary codes correcting a burst of at most t deletions, for both classical error correcting codes and reconstruction codes. By using q-ary ( p , δ ) -dense strings and bounded burst-deletion correcting codes, our constructions reduce the code redundancy in the log log n term and have simpler encoding functions, compared to existing works.
A more general problem is to construct q-ary ( n , N , D t ) -reconstruction codes, i.e., q-ary reconstruction codes of length n for t-deletion channel with N reads. Note that the best known explicit construction (regarding the redundancy) for classical t-deletion correcting binary codes of length n has redundancy ( 4 t 1 ) log n + o ( log n ) [11]. So, we are interested in constructing q-ary ( n , N , D t ) -reconstruction codes, for any fixed q and t, of length n and with redundancy k log n + o ( log n ) bits for some positive integer k such that 1 k < 4 t 1 and N depends only on k and t (and is independent of n).

Author Contributions

Methodology and writing, W.S.; Supervision and project administration, K.C. and T.Q.S.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Singapore Ministry of Education Academic Research Fund Tier 2 T2EP50221-0036.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Levenshtein, V.I. Bounds for Deletion/Insertion Correcting Codes. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Lausanne, Switzerland, 30 June–5 July 2002. [Google Scholar]
  2. Levenshtein, V.I. Binary codes capable of correcting deletions, insertions and reversals. Dokl. Akad. Nauk. SSR 1965, 163, 845–848. (In Russian) [Google Scholar]
  3. Tenengolts, G.M. Nonbinary codes, correcting single deletion or insertion. IEEE Trans. Inform. Theory 1984, 30, 766–769. [Google Scholar] [CrossRef]
  4. Nguyen, T.T.; Cai, K.; Siegel, P.H. A new version of q-ary Varshamov-Tenengolts codes with more efficient encoders: The differential VT codes and the differential shifted VT codes. IEEE Trans. Inform. Theory 2024, 70, 6989–7004. [Google Scholar] [CrossRef]
  5. Brakensiek, J.; Guruswami, V.; Zbarsky, S. Efficient low-redundancy codes for correcting multiple deletions. IEEE Trans. Inform. Theory 2018, 64, 3403–3410. [Google Scholar] [CrossRef]
  6. Sima, J.; Bruck, J. On optimal k-deletion correcting codes. IEEE Trans. Inform. Theory 2021, 67, 3360–3375. [Google Scholar] [CrossRef]
  7. Sima, J.; Gabrys, R.; Bruck, J. Optimal systematic t-deletion correcting codes. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
  8. Guruswami, V.; Hastad, J. Explicit two-deletion codes with redundancy matching the existential bound. IEEE Trans. Inform. Theory 2021, 67, 6384–6394. [Google Scholar] [CrossRef]
  9. Sima, J.; Gabrys, R.; Bruck, J. Optimal codes for the q-ary deletion channel. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
  10. Bitar, R.; Hanna, S.K.; Polyanskii, N.; Vorobyev, I. Optimal codes correcting localized deletions. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021. [Google Scholar]
  11. Song, W.; Polyanskii, N.; Cai, K.; He, X. Systematic codes correcting multiple-deletion and multiple-substitution errors. IEEE Trans. Inform. Theory 2022, 68, 6402–6416. [Google Scholar] [CrossRef]
  12. Song, W.; Cai, K. Non-binary two-deletion correcting codes and burst-deletion correcting codes. IEEE Trans. Inform. Theory 2023, 69, 6470–6484. [Google Scholar] [CrossRef]
  13. Liu, S.; Tjuawinata, I.; Xing, C. Explicit construction of q-ary 2-deletion correcting codes with low redundancy. arXiv 2023, arXiv:2306.02868. [Google Scholar] [CrossRef]
  14. Levenshtein, V. Asymptotically optimum binary code with correction for losses of one or two adjacent bits. Probl. Kibern. 1967, 19, 293–298. [Google Scholar]
  15. Cheng, L.; Swart, T.G.; Ferreira, H.C.; Abdel-Ghaffar, K.A.S. Codes for correcting three or more adjacent deletions or insertions. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014. [Google Scholar]
  16. Schoeny, C.; Wachter-Zeh, A.; Gabrys, R.; Yaakobi, E. Codes correcting a burst of deletions or insertions. IEEE Trans. Inform. Theory 2017, 63, 1971–1985. [Google Scholar] [CrossRef]
  17. Gabrys, R.; Yaakobi, E.; Milenkovic, O. Codes in the damerau distance for deletion and adjacent transposition correction. IEEE Trans. Inform. Theory 2017, 64, 2550–2570. [Google Scholar] [CrossRef]
  18. Lenz, A.; Polyanskii, N. Optimal codes correcting a burst of deletions of variable length. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
  19. Wang, S.; Tang, Y.; Sima, J.; Gabrys, R.; Farnoud, F. Nonbinary codes for correcting a burst of at most t deletions. IEEE Trans. Inform. Theory 2024, 70, 964–979. [Google Scholar] [CrossRef]
  20. Levenshtein, V.I. Efficient reconstruction of sequences. IEEE Trans. Inform. Theory 2001, 47, 2–22. [Google Scholar] [CrossRef]
  21. Levenshtein, V.I. Efficient reconstruction of sequences from their subsequences or supersequences. J. Combin. Theory Ser. A 2001, 93, 310–332. [Google Scholar] [CrossRef]
  22. Dudik, M.; Schulman, L.J. Reconstruction from subsequences. J. Combin. Theory Ser. A 2003, 103, 337–348. [Google Scholar] [CrossRef]
  23. Gabrys, R.; Yaakobi, E. Sequence reconstruction over the deletion channel. IEEE Trans. Inf. Theory 2018, 64, 2924–2931. [Google Scholar] [CrossRef]
  24. Chee, Y.M.; Kiah, H.M.; Vardy, A.; Vu, V.K.; Yaakobi, E. Coding for racetrack memories. IEEE Trans. Inf. Theory 2018, 64, 7094–7112. [Google Scholar] [CrossRef]
  25. Gabrys, R.; Milenkovic, O. Unique reconstruction of coded strings from multiset substring spectra. IEEE Trans. Inform. Theory 2019, 65, 7682–7696. [Google Scholar] [CrossRef]
  26. Chrisnata, J.; Kiah, H.M.; Yaakobi, E. Optimal reconstruction codes for deletion channels. In Proceedings of the 2020 International Symposium on Information Theory and Its Applications (ISITA), Kapolei, HI, USA, 24–27 October 2020. [Google Scholar]
  27. Pham, V.L.P.; Goyal, K.; Kiah, H.M. Sequence reconstruction problem for deletion channels: A complete asymptotic solution. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022. [Google Scholar]
  28. Cai, K.; Kiah, H.M.; Nguyen, T.T.; Yaakobi, E. Coding for sequence reconstruction for single edits. IEEE Trans. Inform. Theory 2022, 68, 66–79. [Google Scholar] [CrossRef]
  29. Chrisnata, J.; Kiah, H.M.; Yaakobi, E. Correcting deletions with multiple reads. IEEE Trans. Inf. Theory 2022, 68, 7141–7158. [Google Scholar] [CrossRef]
  30. Sun, Y.; Ge, G. Correcting two-deletion with a constant number of reads. IEEE Trans. Inform. Theory 2023, 69, 66–79. [Google Scholar] [CrossRef]
  31. Sun, Y.; Xi, Y.; Ge, G. Sequence reconstruction under single-burst-insertion/deletion/edit Channel. IEEE Trans. Inform. Theory 2023, 69, 4466–4483. [Google Scholar] [CrossRef]
  32. Abu-Sini, M.; Yaakobi, E. On the Intersection of Multiple Insertion (or Deletion) Balls and Its Application to List Decoding Under the Reconstruction Model. IEEE Trans. Inform. Theory 2024, 70, 3262–3297. [Google Scholar] [CrossRef]
  33. Sima, J.; Gabrys, R.; Bruck, J. Syndrome compression for optimal redundancy codes. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
  34. Wijngaarden, A.V.; Immink, K.A.S. Construction of maximum run-length limited codes using sequence replacement techniques. IEEE J. Sel. Areas Commun. 2010, 28, 200–207. [Google Scholar] [CrossRef]
  35. Nguyen, T.T.; Cai, K.; Immink, K.A.S.; Kiah, H.M. Capacity-approaching constrained codes with error correction for DNA-based data storage. IEEE Trans. Inform. Theory 2021, 67, 5602–5613. [Google Scholar] [CrossRef]
  36. Sima, J.; Bruck, J. Correcting k deletions and insertions in racetrack memory. IEEE Trans. Inform. Theory 2023, 69, 5619–5639. [Google Scholar] [CrossRef]
Figure 1. Illustration examples: each dot in the upper row represents a symbol of x , and each dot in the lower row represents a symbol of x , where two symbols with equal value are connected by a (solid or dashed) line segment. We can find that: (1) in (a), the substring x [ 6 , 24 ] of x has period 2; (2) in (b), the substring x [ 1 , 12 ] of x has period 4, so x [ n ] { 1 , 2 , 3 , 4 } = x [ n ] { 9 , 10 , 11 , 12 } ; and (3) in (c), the substring x [ 7 , 26 ] of x has period 7.
Figure 1. Illustration examples: each dot in the upper row represents a symbol of x , and each dot in the lower row represents a symbol of x , where two symbols with equal value are connected by a (solid or dashed) line segment. We can find that: (1) in (a), the substring x [ 6 , 24 ] of x has period 2; (2) in (b), the substring x [ 1 , 12 ] of x has period 4, so x [ n ] { 1 , 2 , 3 , 4 } = x [ n ] { 9 , 10 , 11 , 12 } ; and (3) in (c), the substring x [ 7 , 26 ] of x has period 7.
Entropy 27 00085 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, W.; Cai, K.; Quek, T.Q.S. Some New Constructions of q-ary Codes for Correcting a Burst of at Most t Deletions. Entropy 2025, 27, 85. https://doi.org/10.3390/e27010085

AMA Style

Song W, Cai K, Quek TQS. Some New Constructions of q-ary Codes for Correcting a Burst of at Most t Deletions. Entropy. 2025; 27(1):85. https://doi.org/10.3390/e27010085

Chicago/Turabian Style

Song, Wentu, Kui Cai, and Tony Q. S. Quek. 2025. "Some New Constructions of q-ary Codes for Correcting a Burst of at Most t Deletions" Entropy 27, no. 1: 85. https://doi.org/10.3390/e27010085

APA Style

Song, W., Cai, K., & Quek, T. Q. S. (2025). Some New Constructions of q-ary Codes for Correcting a Burst of at Most t Deletions. Entropy, 27(1), 85. https://doi.org/10.3390/e27010085

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop