Combinatorial Methods for String Processing

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Databases and Data Structures".

Deadline for manuscript submissions: closed (1 December 2020) | Viewed by 16034

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor


E-Mail Website
Guest Editor
Department of Informatics, ISEE, Kyushu University, Fukuoka 819-0395, Japan
Interests: algorithms and data structures; string processing; text compression

Special Issue Information

Dear Colleagues,

Strings, or sequences, are the most fundamental form of digital data. Due to recent developments in sensor network technologies and semi-automated machine-to-machine (M2M) communications, sequential data have been increasing more rapidly than ever. It is often the case that such semi-automatically generated sequential data contain abundant repetitive structures. Thus, understanding, revealing, and utilizing combinatorial objects that reside in strings are of great significance in designing efficient string processing algorithms and data structures. This Special Issue on “Combinatorial Methods for String Processing” aims exactly at this goal.

We cordially invite you to submit high-quality papers to this Special Issue. Typical topics of interests include (but are not limited to):

- Combinatorial string problems and solutions;

- Pattern matching algorithms;

- Applied combinatorics on words;

- Text compression;

- Data structures for strings, labeled trees, and compressed text.

Dr. Shunsuke Inenaga
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 1892 KiB  
Article
Reversed Lempel–Ziv Factorization with Suffix Trees
by Dominik Köppl
Algorithms 2021, 14(6), 161; https://doi.org/10.3390/a14060161 - 21 May 2021
Cited by 1 | Viewed by 3156
Abstract
We present linear-time algorithms computing the reversed Lempel–Ziv factorization [Kolpakov and Kucherov, TCS’09] within the space bounds of two different suffix tree representations. We can adapt these algorithms to compute the longest previous non-overlapping reverse factor table [Crochemore et al., JDA’12] within the [...] Read more.
We present linear-time algorithms computing the reversed Lempel–Ziv factorization [Kolpakov and Kucherov, TCS’09] within the space bounds of two different suffix tree representations. We can adapt these algorithms to compute the longest previous non-overlapping reverse factor table [Crochemore et al., JDA’12] within the same space but pay a multiplicative logarithmic time penalty. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Figure 1

25 pages, 442 KiB  
Article
Subpath Queries on Compressed Graphs: A Survey
by Nicola Prezza
Algorithms 2021, 14(1), 14; https://doi.org/10.3390/a14010014 - 5 Jan 2021
Cited by 4 | Viewed by 3669
Abstract
Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in [...] Read more.
Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Figure 1

20 pages, 523 KiB  
Article
Re-Pair in Small Space
by Dominik Köppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai and Keisuke Goto
Algorithms 2021, 14(1), 5; https://doi.org/10.3390/a14010005 - 25 Dec 2020
Cited by 2 | Viewed by 2845
Abstract
Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given [...] Read more.
Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given a text of length n whose characters are drawn from an integer alphabet with size σ=nO(1), an O(min(n2,n2lglogτnlglglgn/logτn)) time algorithm computing Re-Pair with max((n/c)lgn,nlgτ)+O(lgn) bits of working space including the text space, where c1 is a fixed user-defined constant and τ is the sum of σ and the number of non-terminals. We give variants of our solution working in parallel or in the external memory model. Unfortunately, the algorithm seems not practical since a preliminary version already needs roughly one hour for computing Re-Pair on one megabyte of text. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Figure 1

33 pages, 636 KiB  
Article
Computing Maximal Lyndon Substrings of a String
by Frantisek Franek and Michael Liut
Algorithms 2020, 13(11), 294; https://doi.org/10.3390/a13110294 - 12 Nov 2020
Cited by 1 | Viewed by 2768
Abstract
There are two reasons to have an efficient algorithm for identifying all right-maximal Lyndon substrings of a string: firstly, Bannai et al. introduced in 2015 a linear algorithm to compute all runs of a string that relies on knowing all right-maximal Lyndon substrings [...] Read more.
There are two reasons to have an efficient algorithm for identifying all right-maximal Lyndon substrings of a string: firstly, Bannai et al. introduced in 2015 a linear algorithm to compute all runs of a string that relies on knowing all right-maximal Lyndon substrings of the input string, and secondly, Franek et al. showed in 2017 a linear equivalence of sorting suffixes and sorting right-maximal Lyndon substrings of a string, inspired by a novel suffix sorting algorithm of Baier. In 2016, Franek et al. presented a brief overview of algorithms for computing the Lyndon array that encodes the knowledge of right-maximal Lyndon substrings of the input string. Among those presented were two well-known algorithms for computing the Lyndon array: a quadratic in-place algorithm based on the iterated Duval algorithm for Lyndon factorization and a linear algorithmic scheme based on linear suffix sorting, computing the inverse suffix array, and applying to it the next smaller value algorithm. Duval’s algorithm works for strings over any ordered alphabet, while for linear suffix sorting, a constant or an integer alphabet is required. The authors at that time were not aware of Baier’s algorithm. In 2017, our research group proposed a novel algorithm for the Lyndon array. Though the proposed algorithm is linear in the average case and has O(nlog(n)) worst-case complexity, it is interesting as it emulates the fast Fourier algorithm’s recursive approach and introduces τ-reduction, which might be of independent interest. In 2018, we presented a linear algorithm to compute the Lyndon array of a string inspired by Phase I of Baier’s algorithm for suffix sorting. This paper presents the theoretical analysis of these two algorithms and provides empirical comparisons of both of their C++ implementations with respect to the iterated Duval algorithm. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Graphical abstract

9 pages, 920 KiB  
Article
Efficient Data Structures for Range Shortest Unique Substring Queries
by Paniz Abedin, Arnab Ganguly, Solon P. Pissis and Sharma V. Thankachan
Algorithms 2020, 13(11), 276; https://doi.org/10.3390/a13110276 - 30 Oct 2020
Cited by 8 | Viewed by 2432
Abstract
Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] [...] Read more.
Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012]. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Figure 1

Back to TopTop