1. Introduction
One challenge in biology is to understand how species evolve, considering that new organisms arise from mutations that occurred in others. Using the principle of parsimony, the minimum number of rearrangements that transform one genome into another, called rearrangement distance, is a widely adopted way to estimate the evolutionary distance between two genomes. A genome rearrangement is a global mutation that alters the order and the orientation of the genes in a genome.
Depending on the genomic information available and the problems considered, a genome can be modeled in different ways. Considering that a genome has no repeated genes, we can model it as a permutation, with each element representing a gene. Each gene has an orientation inside the genome. Gene orientation is represented by a plus or minus sign in each element of the permutation. In this case, we say that the permutation is signed. Due to mapping problems in the genome, orientation information could not be available. In this scenario, we use unsigned permutations to represent the genomes. By representing one of the genomes as the identity permutation, we reduce the problem of transforming one permutation into another to the problem of sorting a permutation with the minimum number of rearrangements, which is called sorting rearrangement distance or simply distance.
A rearrangement model is the set of valid rearrangements used to calculate the distance. A reversal rearrangement inverts a segment of the genome, and a transposition rearrangement swaps two adjacent segments of the genome. Genome rearrangements are also sometimes called operations.
The problems of Sorting Permutations by Transpositions and Sorting Unsigned Permutations by Reversals are NP-Hard [
1,
2]. The best known results for both problems are approximation algorithms with factor 1.375 [
3,
4]. Despite that, the problem of Sorting Signed Permutations by Reversals is solvable in polynomial time [
5]. The variation in which the rearrangement model contains both (signed or unsigned) reversals and transpositions is NP-Hard [
6]. For signed permutations, there are 2-approximation algorithms [
7,
8]. For unsigned permutations, the best approximation factor is
[
7], where
k is the approximation factor of an algorithm used for cycle decomposition of the breakpoint graph [
9]. Given the best known value for
k [
9], this algorithm guarantees an approximation factor of
, where
.
Extra restrictions can be added to these sorting problems, such as applying operations that only affect specific parts of a genome [
10,
11]. Other variants of the sorting problems emerged from the assumption that operations which affect large portions of a genome are less likely to occur [
12]. Most of these variants add a constraint that limits the size of a valid operation [
13,
14,
15]. The size of an operation is equal to the number of elements affected by it. Considering a size limit of 2, the problems of Sorting Unsigned Permutations by Reversals and/or Transpositions are solvable in polynomial time [
13]. Considering a size limit of 3, the best known approximation factors for Sorting Unsigned Permutations by Reversals, by Transpositions, and by Reversals and Transpositions are 2 [
13],
[
14], and 2 [
15], respectively. For Sorting Signed Permutations by Reversals and by Reversals and Transpositions, the best known approximation factors are 5 [
16] and 3 [
16], respectively. Recently, Zhang et al. [
17] presented polynomial algorithms to decide if a permutation can be sorted using a number of short operations equal to the lower bound for the distance, which are based on the number of inversions and entropy of the permutation (these concepts are described in
Section 3 and
Section 4).
The problem of Sorting (Unsigned or Signed) Permutations by
-Operations is a generalization of the size limited variants in which an operation is valid if its size is less than or equal to
. There are
-approximation algorithms for Sorting Unsigned Permutations by
-Reversals, by
-Transpositions, and by
-Reversals and
-Transpositions [
18]. The study of
-operations is motivated by the observation that the rearrangements occurred in some species do not act on very large segments of the genome [
12,
19]. Using size limited operations makes more sense when one knows that the elements are not so far away from their original positions, so we introduce the study of Sorting
-Permutations by
-Operations. A permutation
is a
λ-permutation if all elements of
are at a distance less than
from their correct positions considering the identity permutation.
We consider the problems of sorting (unsigned or signed)
-permutations by
-reversals, by
-transpositions, and by
-reversals and
-transpositions. Some of the results for the problems of sorting unsigned
-permutations were previously presented by us [
20], but, to the best of our knowledge, there were no results for the sorting signed
-permutations problems.
The paper is organized as follows. In
Section 2, we present all concepts used in this paper and an NP-hardness proof for the sorting
-permutations by
-operations problems, considering four rearrangement models. In
Section 3 and
Section 4, we present
-approximation algorithms for the sorting unsigned and signed
-permutation problems, respectively. In
Section 5, we present algorithms with approximation factors of
and
for Sorting
-Permutations by
-Reversals and the other problems addressed, respectively. In
Section 6, we show experimental results which compare the algorithms presented in
Section 3,
Section 4 and
Section 5. We conclude the paper in
Section 7.
2. Preliminaries and Basic Facts
We denote a signed permutation by = … , where and , for all . We denote an unsigned permutation similarly but neglect the signs of the elements.
We assume that there are two extra elements and in , but, for convenience, they are omitted from the permutation’s representation. Given an integer as input, we say that is a λ-permutation if we have for all . We define the size of a permutation as the number of elements in it, without counting and .
Given two permutations and , the composition of and is equal to , such that , if , and , otherwise. The inverse permutation of , denoted by , is the permutation such that . For example, given , we have that .
An unsigned reversal is denoted by , with , and it transforms an unsigned permutation into . A signed reversal is denoted by , with , and when applied on a signed permutation = … , the resulting permutation is . The size of a (signed or unsigned) reversal is given by . For example, given the signed permutation = , we have = , and the size of such operation is . We say that or is a λ-reversal if .
An operation of transposition is denoted by , with 1 ≤ i < j < k ≤ , and when applied on a (unsigned or signed) permutation = … , the result is the permutation = … … . The size of a transposition is given by . For example, given the unsigned permutation = 5 6 1 2 , we have = 2 3 4 5 and the size of such operation is . We say that is a λ-transposition if .
The goal of these problems is to transform a -permutation into the identity permutation = 2 ⋯ by applying the minimum number of -operations, which defines the sorting distance, such that each permutation generated during the process is also a -permutation.
We denote by , , and the sorting distance for the sorting unsigned -permutation problems when we only have -reversals, -transpositions, and when we have both operations, respectively. Similarly, we denote by and the sorting distance for the sorting signed -permutation problems when we only have -reversals and when we have both operations, respectively.
2.1. Complexity of Sorting -Permutations by -Operations
Next, we show an NP-Hardness proof for the sorting distance problem considering some rearrangement models. For this purpose, we define the following decision problems:
Sorting -Permutations by -Operations: given a positive integer , a rearrangement model containing only -operations, a -permutation , and a non-negative integer k, decide if it is possible to sort using, at most, k operations that belong to .
Sorting Permutations by Rearrangements: given a rearrangement model , a permutation , and a non-negative integer k, decide if it possible to sort using, at most, k operations that belong to .
When , the problems of Sorting -Permutations by -Operations are NP-Hard (except for signed reversals), since they are equivalent to the sorting problems without restrictions in the size of operations. Next, we show that the problem is NP-Hard for a wide range of values. When the rearrangement model is clear from the context, it is omitted from the instance.
Theorem 1. The problem of Sorting λ-Permutations by λ-Operations is NP-Hard when , for positive constants c and , considering unsigned reversals, transpositions, transpositions and unsigned reversals, and transpositions and signed reversals.
Proof. Consider the model containing only unsigned reversals. The problem of Sorting Permutations by Unsigned Reversals is NP-Hard [
2]. Now, we show a reduction from this problem to the problem of Sorting
-Permutations by Unsigned
-Reversals when
, for positive constants
c and
.
Given an instance , where has size n, for Sorting Permutations by Unsigned Reversals, we create an instance , where , is a model with unsigned -reversals, and , such that , which is a -permutation. We note that, since c and d are constants, this is a polynomial-time reduction.
Now, we show that the instance is satisfied for Sorting Permutations by Unsigned Reversals iff .
(⇒) Suppose that there exists a sorting sequence S of reversals for with k reversals or less. Note that has size n, and all reversals in S are -reversals, since . The sequence S can sort the -permutation ; therefore, .
(⇐) Suppose that there exists a sorting sequence of -reversals for with k operations or less. We now construct a sorting sequence S for with k or less reversals. If all operations of affect only elements between the positions 1 and n of , then is also a sorting sequence for . Otherwise, there exists operations in which affect the elements in the interval . Let , where ℓ is the length of . We construct , where starts at the first element of , from left to right, which is in , and it ends at the last element of which is in . Since S inverts the elements of in the same order as they are inverted in by , this sequence sorts . Note that the length of S is less than or equal to ℓ because may have reversals which do not affect elements that are in , which corresponds to an empty reversal in S that can be ignored.
The proof is analogous for the other three models. □
2.2. Inversions
An inversion is defined as a pair of elements such that and . The number of inversions in is denoted by .
For example, the permutation has 6 inversions. The 3-reversal transforms into that has 3 inversions.
Lemma 1. For all λ-permutations and all , we have , , and .
Proof. A -reversal of length k can only change the inversions between elements of the segment affected by it. Therefore, the maximum number of inversions that a reversal of length k can remove is equal to , since this is the maximum number of inversions between the elements of a segment of length k. By definition, the length of a -reversal is less than or equal to , so a -reversal can remove, at most, inversions. Since the identity permutation is the only one with , any sorting sequence has to remove all inversions of the permutation, which results in the lower bound .
A -transposition of length also affects only the inversions between elements affected by it. Moreover, the inversions between elements on the segment are not changed. The same is valid for inversions between elements of the segment . So, if an inversion , with , is removed, then is in and is in . Therefore, the maximum number of inversions that a transposition can remove is equal to , since . This bound is maximum when , which results in . As before, this results in the lower bound .
Using a similar argument, we have . □
2.3. Breakpoints
An unsigned reversal breakpoint is defined as a pair of elements such that , for all , in the problems of Sorting Unsigned -Permutations by -Reversals and Sorting Unsigned -Permutations by -Reversals and -Transpositions. A transposition breakpoint or signed reversal breakpoint is defined as a pair of elements such that , for all , in the problems of Sorting Unsigned -Permutations by Transpositions, Sorting Signed -Permutations by -Reversals, and Sorting Signed -Permutations by -Reversals and -Transpositions. Considering a specific type of breakpoint, the number of breakpoints in is denoted by .
Fact 1. [21,22] For all λ-permutations and all , we have , , , , and . 2.4. Strips
A maximal subsequence without any breakpoints , for all , is called a strip.
For the problems of sorting unsigned -permutations, if the strip’s elements are in ascending (resp. descending) order, then we call it an increasing (resp. decreasing) strip. Strips containing only one element are considered to be increasing ones. For example, considering sorting unsigned -permutations by both operations and , we have two increasing strips and , and a decreasing strip . Note, however, that segment is not a decreasing strip for the problem of Sorting -Permutations by -Transpositions. This is because, direct from the definition of strips and breakpoints, there are no decreasing strips when this problem is being considered.
For the problems of sorting signed -permutations, if the strip’s elements are positive, then we call it an increasing strip. Otherwise, the strip is called a decreasing strip. Strips containing only one element are considered to be increasing ones if such an element is positive. Otherwise, they are considered to be decreasing strips. For example, considering sorting signed -permutations and , we have two increasing strips, and , and two decreasing strips, and . Note that, despite the elements of strip are in ascending order, it still is a decreasing strip, according to our definition.
The number of elements in a strip S of a -permutation is denoted by . We say that an element is out-of-place if .
Lemma 2. Let π be a λ-permutation, and let be the smallest element out-of-place in π (i.e., is minimum and ). The strip S that contains is such that .
Proof. First, suppose we have the increasing strip . Let be the segment of elements to the left of S. Note that the absolute value of any element in R is greater than any element in S. Therefore, , and then .
When we have in a decreasing strip, the proof is analogous. □
3. Inversions-Based Approximation Algorithms for Unsigned Permutations
In this section, we present approximation algorithms based on the concept of inversions for the sorting unsigned -permutations problems we are addressing.
The next lemma shows that it is always possible to remove at least one inversion from a -permutation by applying one -operation which results in a -permutation.
Lemma 3. For any λ-permutation , we can apply a 2-reversal or a 2-transposition to obtain a λ-permutation with inversions.
Proof. Let be the smallest element out-of-place in . Initially, note we have inversion , since and . Let be a 2-operation that swaps elements and , and let . It is easy to see that , since such inversion was removed, and note that there always exists a -reversal or a -transposition equivalent to , because the elements are adjacent, and, so, both only swap two elements.
Observe that, in , element is closer to its correct position, since it was moved to the left. Hence, we follow by showing that is a -permutation by considering two cases according to the values of .
If , then is also closer to its correct position, in . Otherwise, . Thus, element will be, in , one position away from its correct position. Then, note that because is a -permutation. In addition, observe that we have because , and because , and, so, . Therefore, is a -permutation, and the result follows. □
A generic greedy approximation algorithm for the three problems we are addressing in this section is presented in the next theorem. It receives an integer and a -permutation as input. It is greedy because it always tries to decrease the largest amount of inversions in . Since an unsorted permutation always contains inversions, the algorithm will finally sort .
Theorem 2. There exist -approximation algorithms for the problems of Sorting Unsigned λ-Permutations by λ-Reversals, by λ-Transpositions, and by λ-Reversals and λ-Transpositions.
Proof. Let be an integer, and let be a -permutation. Consider an algorithm which chooses a -operation , such that is a -permutation and is as small as possible, and then it applies such operation over . The algorithm repeats the same process over the resulting permutation until it reaches the identity permutation.
In the worst case, we always have one -operation reducing the number of inversions by one unit, as shown in Lemma 3. Therefore, the number of operations of such greedy algorithm is, at most, , and the approximation factor follows immediately from Lemma 1. □
Note that the distance is
because any unsigned permutation can be sorted with
-reversals or
-transpositions. For Sorting Unsigned Permutations by
-Reversals, at each step the algorithm considers
possible reversals that can be chosen. Since the variation in the number of inversions caused by an operation can be calculated in
time [
23], the algorithm has total time complexity
. Using the same analysis, we conclude that the algorithms involving transpositions have total time complexity
.
4. Algorithms Based on Inversions and Entropy for Signed -Permutations
The entropy of an element from a permutation is given by , that is, the distance between and its position in . We denote by the set of negative elements in such that is even, for all . Similarly, we denote by the set of positive elements in such that is odd. For example, given , we have and , since and are negative elements with even entropy , and is a positive element with odd entropy , respectively.
Fact 2. [16] Let π be a signed permutation, and let σ be a 2-reversal. We have . Fact 3. [24] Let π be a signed permutation, and let σ be a λ-operation. We have . The score of a -operation over a signed permutation is given by .
Combining Fact 3 with the fact that a -reversal and a -transposition can remove, at most, inversions, we have an upper bound for the score of a permutation in Lemma 4. This upper bound implies Corollary 1.
Lemma 4. Let π be a signed permutation. We have .
Corollary 1. For any signed λ-permutation , , and , we have .
We follow by showing in Lemma 5 that there always exists a -operation with score at least 1. Having this in mind, a generic greedy approximation algorithm for the two sorting signed permutations problems we are addressing is presented in next theorem. It receives an integer and a -permutation as input. It is greedy because it always chooses a -operation with the largest score. Since the only permutation with is the identity, it will, eventually, sort .
Lemma 5. For any signed permutation and , there always exists a λ-operation σ such that is a λ-permutation and .
Proof. The proof is divided into two cases, according to .
First, consider . Note that, in this case, we only have zero entropy elements in , implying that . So, by applying an unitary -reversal over a negative element, we get a -permutation such that . Since such an operation holds , we have .
For , Lemma 3 shows how to decrease the amount of inversions by applying one 2-reversal and, since Fact 2 shows that , we have .
Note that, in both cases, the resulting permutation is also a -permutation. □
Theorem 3. There exist -approximation algorithms for the problems of Sorting Signed λ-Permutations by λ-Reversals and Sorting Signed λ-Permutations by λ-Reversals and λ-Transpositions.
Proof. Let be an integer, and let be a -permutation. Consider an algorithm which chooses the -operation such that is a -permutation and is as great as possible and then it applies such operation over . The algorithm repeats the same process in the resulting permutation until it reaches the identity permutation.
In the worst case, as shown in Lemma 5, we always have one -reversal with score 1. Therefore, the number of operations of such greedy algorithm is, at most, , and the approximation factor follows immediately from Corollary 1. □
Since is and the time to calculate the variation in the number of is , the analysis of time complexity is analogous to the algorithms previously presented for the sorting unsigned permutation problems.
5. Breakpoints-Based Approximation Algorithms
In this section, we present approximation algorithms based on the concept of breakpoints for the five problems we are addressing.
In the next lemma, we suppose that the smallest element out-of-place is in an increasing strip of a -permutation . We show how to reduce the number of breakpoints of by moving this strip to its correct position using an operation that may not be a -operation, but assuring that the resulting permutation is also a -permutation.
Lemma 6. Let π be a λ-permutation. Let be the smallest element out-of-place in π. Suppose that is in an increasing strip . Then, is a λ-permutation, , and .
Proof. Let be the segment of elements in that will be transposed with S. Observe that the absolute value of any element in R is greater than any element in S, so is a -permutation, since greater elements (in their absolute values) are moved to the right and smaller elements to the left. In addition, observe that, in , we have the three breakpoints , , and , where the first one is because and , and the second and third ones are because the strip’s start and strip’s end are at positions j and k, respectively. Transposition moves the elements of S to their correct positions by transposing them with elements of R, thus removing at least breakpoint . Since a transposition can add, at most, three breakpoints, but we already had all of them and we removed at least , we have .
By Lemma 2, we have ; thus, . Since is a -permutation, we have , and, by construction, ; thus, . Therefore, . □
For a -permutation with only increasing strips, the next lemma shows that it is possible to find a sequence with, at most, 4 transpositions that decreases the number of breakpoints in , assuring that all permutations generated in the process are -permutations. Using these operations, we can create an algorithm that sorts using, at most, -transpositions.
Lemma 7. Let π be a λ-permutation. Let be the smallest element out-of-place in π. Suppose that is in an increasing strip . It is always possible to obtain a λ-permutation with, at most, breakpoints by applying, at most, 4 λ-transpositions such that all intermediary permutations are λ-permutations.
Proof. Let be the segment that will be moved to the right in . Note that , by Lemma 2, and , because is a -permutation.
The idea is to apply a sequence with, at most, four -transpositions that divide both segments and into, at most, two parts each, where each part has, at most, elements, and then exchange each part of S, at most, twice (and at least once), with the (possible) two parts of R. If we had exactly elements in each of S and R, such sequence would be .
Now, we have to show that, after each of the, at most, four operations is applied, we have a -permutation as result.
Observe that the absolute value of any element in R is greater than any element in S. Since each -transposition puts elements of S closer to their correct positions by transposing them with greater elements (in their absolute values) of R, we have a -permutation after each -operation applied. After all -transpositions are applied, the elements of S are at positions from i to , and the elements of R are at positions from to k, resulting in , which is a -permutation with at least breakpoints, as shown in Lemma 6. □
For a -permutation with only increasing strips, the next lemma shows how to decrease the number of breakpoints in using only -reversals, also assuring that all permutations generated in the process are -permutations.
Lemma 8. Let π be a λ-permutation. Let be the smallest element out-of-place in π. Suppose that π only has increasing strips and that is in a strip . It is always possible to obtain a λ-permutation with, at most, breakpoints by applying, at most, λ-reversals such that all intermediary permutations are λ-permutations.
Proof. Let be the segment that will be moved to the right in . Note that , by Lemma 2, and , because is a -permutation.
The idea is to move elements from S to their correct positions by applying, at most, two sequences of pairs of -reversals, where each one puts, at most, elements of S in their correct positions at a time.
In the first sequence of -reversals, there are two possibilities. If , then the first operation of each pair reverses elements contained in both S and R. If , then it reverses elements contained in both S and R. In any case, the second operation of each pair reverses back the elements of R affected by the first one, in order to leave with only increasing strips again (except for the elements of S which were affected by the first operation).
After the sequence is applied, we have, at most, elements of S from positions i to , and, maybe, they are in a decreasing strip. If this is the case, then one more -reversal has to be applied to put these elements in their correct places, by reversing such decreasing strip.
The second sequence of -reversals puts the (at most) remaining elements of S in their correct positions, following the same idea, and, also, maybe one extra -reversal will be necessary after it is applied. Note that, if there are no remaining elements (in case of ), this sequence is not necessary.
The largest amount of operations needed in the process described above happens when we have elements in S. Since , our process starts by moving the first elements of S to their correct positions. Observe that, for each pair of reversals applied in the first sequence, one of them moves elements positions to the left (except, maybe, by the last pair), and then the other one reverses again the elements of R affected by the first reversal. Besides that, by the definition of -permutations, the elements of S are, at most, positions away from their correct positions, and, so, at most, 2 pairs of reversals are needed to put them at position i. As we have told before, maybe such elements of S ended up in reversed order and, in order to fix it, one more reversal is needed. Hence, we already have a total of, at most, 5 reversals applied.
To move the remaining element of S to its correct position, all the -reversals of the second sequence will have size 2 (note that, in this case, we do not need the second operation of each pair), which means such element will be moved only 1 position to the left per operation, giving an extra amount of -reversals. Therefore, the number of -reversals to move S to its correct position is, at most, .
Now, we have to show that, after applying each operation, we have a -permutation as result and, after the last operation is applied, we have .
Observe that any element in R is greater than any element in S. Then, since the first operation of each pair moves elements of R to the right and elements of S to the left, all elements affected will be closer to their correct positions, resulting in a -permutation. The second operation of each pair reverses elements of R to ascending order again, so it also results in a -permutation. After both sequences of -reversals are applied, all elements of S are at positions from i to and all elements of R are at positions from to k, resulting in , which is a -permutation with at least one less breakpoint than , as shown in Lemma 6. □
Lemma 9 shows how to decrease the number of breakpoints of a -permutation , considering that the smallest element out-of-place of is in a decreasing strip. Since Lemma 8 deals with the case where has no decreasing strips, we can use these two lemmas together to develop an algorithm to sort a -permutation using only -reversals.
Lemma 9. Let be the smallest element out-of-place in a λ-permutation π. Suppose that is in a decreasing strip . It is always possible to obtain a λ-permutation with, at most, breakpoints by applying, at most, one λ-transposition and one λ-reversal.
Proof. When , one reversal put elements of S in their correct positions. It is easy to see that is a -permutation and, since by Lemma 2, we have that is a -reversal.
Now, assume . Note that, in this case, we have the three breakpoints , , and , where the first one is because and , and the second and third ones are because the strip’s start and strip’s end are at positions k and j, respectively. Thus, we can apply the -transposition followed by the -reversal to obtain , since a -transposition can add, at most, three breakpoints but we already had , , and , and the second -reversal can add, at most, two breakpoints, but we already had and and we removed the first one, since all elements of S will be in their correct positions in .
Now, we have to show that, after each operation is applied, we have a -permutation as result.
Let be the segment of elements that should be moved in order to put S in its correct position. Observe that the absolute value of any element in R is greater than any element in S. The first operation, a -transposition, transposes S only with greater elements (in their absolute values); thus, the result is a -permutation. The second operation, a -reversal, just reverses a decreasing strip to put the elements of S in their correct positions; thus, it also results in a -permutation. Hence, we have as result a -permutation with at least one less breakpoint. □
The next theorems describe approximation algorithms for the problems we are addressing. Lemma 10 is auxiliary to Theorem 4. The algorithms receive an integer and a -permutation as input. The goal is to decrease at least one unit on the number of breakpoints in by moving elements to their correct positions (applying Lemmas 7 and 9). Since the only permutation with no breakpoints is the identity, they will, eventually, sort .
Lemma 10. Let π be a λ-permutation. Let be a decreasing strip in π (thus, ). Let be the resulting permutation after reversing S in π. Then, is a λ-permutation.
Proof. First, note that element is to the right of element . We show that the lemma follows by considering four cases, according to the positions of elements i and j in relation with the elements and .
Case (i), : note that both and are to the left of S. Then, after reversing S, element i is closer to its correct position, while element j is moved away from its correct position. Despite this, the distance between and j in is smaller than the distance between and i in , and, so, if is a -permutation, is also a -permutation.
Case (ii), : note that is to the left of S, and is in S. Then, after reversing S, the element i is closer to its correct position and the distance of j to its correct position will still be less than , since the size of S is, at most, , as Lemma 2 shows.
Case (iii), : similar to (ii).
Case (iv), : similar to (i). □
Theorem 4. The problems of Sorting (Unsigned or Signed) λ-Permutations by λ-Reversals have -approximation algorithms.
Proof. Let be an integer and be a -permutation. Consider an algorithm which first applies one -reversal over each decreasing strip of in order to get a -permutation with only increasing strips. By Lemma 10, we guarantee that all intermediary permutations generated by these -reversals are -permutations.
Then, the algorithm will repeatedly take the smallest element out-of-place and move the increasing strip that contains it to its correct position, obtaining a -permutation with at least one less breakpoint, until it reaches the identity permutation.
As shown in Lemma 8, at most
-reversals are needed to move each strip to its correct position. Since, maybe, one extra
-reversal could have been applied in the beginning of the algorithm to transform such strip into an increasing one, we have that, at most,
-reversals can be applied to remove at least one breakpoint. Therefore, the number of operations of our algorithm is, at most,
for
, and the inequality follows from Fact 1. □
Theorem 5. The problem of Sorting λ-Permutations by λ-Transpositions has a 12-approximation algorithm.
Proof. Let be an integer, and let be a -permutation. The algorithm will repeatedly take the smallest element out-of-place and move the increasing strip that contains such element to its correct position, obtaining a -permutation with at least one less breakpoint, until it reaches the identity permutation.
As shown in Lemma 7, at most, 4 -transpositions are needed to move each strip to its correct position. Then, in the worst case, we remove 1 breakpoint every 4 -transpositions applied. With this and Fact 1, the number of operations of our algorithm is, at most, . □
Theorem 6. The problems of Sorting (Unsigned or Signed) λ-Permutations by λ-Reversals and λ-Transpositions have 12-approximation algorithms.
Proof. Let be an integer, and let be a -permutation. Let be the smallest element out-of-place in .
We have two cases to consider: when the strip which contains is decreasing or not. In both cases, we can at least remove breakpoint from without adding other ones by applying, at most, 4 -transpositions (if the strip is increasing) or, at most, 2 -operations (if the strip is decreasing), as shown in Lemmas 7 and 9, respectively.
Then, considering both cases described, the algorithm will repeatedly take the smallest element out-of-place and move the strip that contains it to its correct position, decreasing at least one breakpoint at a time, until it reaches the identity permutation.
Note that, in the worst case, we remove 1 breakpoint every 4 -transpositions, and, so, the result is analogous to Theorem 5. □
Note that is and the time complexity to find the strip with the smallest element out-of-place in is . So, we conclude that the time complexity for the -approximation algorithms and the -approximation algorithm are and , respectively.
6. Experimental Results
In order to analyze how the proposed algorithms work from a practical perspective, we have implemented the inversions-based (Theorems 2 and 3) and the breakpoints-based (Theorems 4–6) greedy algorithms.
We can also achieve an approximation factor of
with algorithms that always apply an operation of size 2 according to Lemma 3, for unsigned permutations, or Lemma 5, for signed permutations. There exists a way to implement such algorithms so that they have time complexity of
[
25]. We also implemented these algorithms, which we named super short algorithms, for comparison purposes. The difference between the super short algorithms and the inversions-based algorithms from Theorems 2 and 3 is that the super short algorithms only use operations of size 2, while the algorithms from Theorems 2 and 3 use, at each iteration, the operation that most decrease the number of inversions, which can have size, at most,
.
We performed experiments considering a total of 1000 signed and unsigned -permutations, with size equal to 100 and values of , as input for the algorithms. The considered -permutations were generated in two different ways: (i) totally random, and (ii) by applying 20 random -operations (according to the rearrangement model of each problem) over the identity. We reinforce that all generated permutations in both cases, including the intermediary ones of case (ii), are -permutations. For (i), the -permutations tend to have a high amount of breakpoints, and for (ii) they tend to have a high amount of strips and, moreover, we know the distance is, at most, 20. Then, we compared the results according to the average and maximum approximation factors obtained for all permutations. For each permutation, we calculated the approximation factor by dividing the size of the sorting sequence by the maximum value of lower bound between the ones shown in Lemma 1 and Fact 1.
Even though the inversions-based and the super short algorithms have the same theoretical approximation factor, the experiments show that the inversions-based algorithms are much better in practice. Hence, for the rest of this analysis, we only compare the results of the inversions-based and breakpoints-based algorithms. Considering the results for totally random -permutations, we observed that the maximum approximation factors for sorting unsigned -permutations problems were and for -reversals, and for -transpositions, and and for when both -operations are allowed, considering the breakpoints-based and the inversions-based algorithms, respectively. For the sorting signed -permutations problems, the maximum approximation factors were and for -reversals and and for when both -operations are allowed, considering the breakpoints-based and the inversions-based algorithms, respectively.
Another observation is that, for totally random -permutations, the average approximation factor of the inversions-based algorithm and the breakpoints-based algorithm were similar (except for Sorting Signed -Permutations by -Reversals), even with the relevant difference among their theoretical approximation factors.
Regarding the results for random -permutations generated from the identity, the maximum approximation factors for sorting unsigned -permutations problems were and for -reversals, and for -transpositions, and and for when both -operations are allowed, considering the breakpoints-based and the inversions-based algorithm, respectively. For the sorting signed -permutations problems, the maximum approximation factors were and for -reversals and and for when both -operations are allowed, considering the breakpoints-based and the inversions-based algorithm, respectively.
Furthermore, for this second type of -permutations, the maximum approximation factors of the inversions-based algorithm for the sorting signed -permutations problems and for the Sorting Unsigned -Permutations by -Reversals were considerably greater when compared with their average approximation factors. Despite that, the average and maximum factors given by the inversions-based algorithm were better for the two sorting unsigned -permutations problems which allow -transpositions. For both problems which allow only -reversals, the breakpoints-based algorithm had better average and maximum approximation factors. For Sorting Signed -Permutations by -Reversals and -Transpositions, both algorithms had similar average approximation factor, but the breakpoints-based algorithm had a better and more constant maximum approximation factor.
Our experiments reveal that the average approximation factor for the breakpoint-based algorithms slight increases for the interval , while the average approximation for the inversion-based algorithms has a greater increase in the same interval. This occurred because there are more -reversals that decreases a breakpoint, when using bigger values of , since the strips tend to get smaller compared to the value of , which is a crucial point in Lemmas 8 and 9. A similar relation is not so common for inversions, since extending the segment to be inverted may lead to inversions being added.