2. Materials and Methods
KLEIN cipher
KLEIN is a lightweight block cipher of the permutation substitution network type proposed in [
10]. It takes a 64-bit plaintext block as input and returns a 64-bit ciphertext block. Three variants, called KLEIN-64, KLEIN-80, and KLEIN-96, differ in key size: 64, 80, and 96 bits, respectively. All variants internally process 64-bit text blocks. The number of rounds is also different: 12, 16, and 20 for each variant. This algorithm has been extended in several ways. For example, in [
11], the key expansion algorithm was modified and a more advanced scheme called N-KLEIN was introduced. In addition, a quantum circuit was implemented on the S-box using the in-place method, which reduced the width and depth of the circuit, thereby improving the implementation efficiency of the quantum circuit. For more details on this block cipher, see, for example, [
12,
13,
14,
15,
16].
Genetic Algorithm
This section briefly describes the GA scheme used in this paper. In Algorithm 1, the individuals of the population are elements of the key space taken as binary blocks. By selecting the s parents, a subset S of is obtained. The parents are selected using the tournament method, where two individuals are randomly chosen and the one with the highest fitness is selected. Elements of the set S are crossed, and the descendants are added to if they are not already members. For crossover, the two-point crossover is used, and the probability of two individuals crossing over was set to 0.6 for all experiments. The mutate operation randomly mutates up to three components of binary blocks, with a mutation ratio of 0.2 for all experiments.
Let
F be a fitness function; an individual
of the population is better adapted than another individual
, if it has greater fitness, i.e., if
. The following fitness functions are used (see [
17]). Let
where
and
are block ciphers,
T is plaintext,
K is a key, and
C is the corresponding ciphertext, such that,
. The function, based on the Hamming distance
, for a certain individual
X of the population is,
which measures the closeness between the ciphertexts
C and the text obtained from encrypting
T with the probable key
X.
Algorithm 1 Genetic Algorithm |
Input: m (number of individuals in the population), F (fitness function), g (number of generations), s (number of individuals selected to mate). Output: the individuals with the highest fitness function as the best solution.
- 1:
Generate randomly an initial population with m individuals. - 2:
Calculate , (the fitness of each individual of ). - 3:
while no solution found or g generations not reached do - 4:
Select s parents of . - 5:
To apply the Crossover operator to the s selected elements and generate offspring pairs. - 6:
Mutate each of the resulting descendants. - 7:
Compute the fitness function F for each of the offspring and their mutations. - 8:
Using the Tournament Method between two, based on the aptitudes of the parents and offspring, decide what will be the new population for the next generation, selecting two individuals at random each time and choosing the higher fitness. - 9:
end while
|
The next fitness function is based on measuring the closeness between ciphertexts, but on their representation in decimal rather than binary. Let
be the corresponding decimal conversion of the binary block
Y; the function is
For the GA specification on block ciphers and other details on the values of the operators and parameters used in the experiments of this paper, see Section 3 of [
18] and Section 2.1 of [
17].
BBM and TBB key space partition methodologies
The BBM and TBB key space partitioning methods allow the GA to work on a particular subset of the set of admissible solutions as if it were the full set. The partitioning into equivalence classes allows it to use this algorithm in parallel, in multiple classes simultaneously and independently.
Let be the key space of length and be such that , , and . Then, in both methodologies, the formulas to represent the elements of are identical, i.e.: with
Both methodologies involve running the GA on a subset of the key space rather than the entire key space. In the case of BBM, the subset is associated with the class of keys that share the same quotient (
q). The TBB method works with a subset defined by keys that share the same remainder (
r), where the elements of each class are distributed across the set of keys. The parameters
q,
r,
, and
have a dual role in both methods. To avoid ambiguity in the notation, we refer to the parameters of the BBM methodology as
q,
r,
, and
. While
,
,
and
are used to refer to the same parameters in the TBB methodology. See [
17,
18] for more details on these methods.
In [
17], the authors work with the BBM and TBB methodologies; furthermore, the study of certain parameters that intervene in GAs was carried out, such as the time it takes to execute a certain number of iterations; several fitness functions were introduced; and which ones led to better results was analyzed. The experiments were carried out with the block ciphers AES(
t). Conversely, in the present investigation, BBM and TBB methodologies are combined to crate a new methodology for key space partitioning; in addition, other methodologies are proposed.
The focus of this research is to propose different methodologies for partitioning the key space. These methodologies can be used with different ciphers and different search algorithms because the main utility is that they allow parallel search in different classes simultaneously. The procedure is adaptable and can be applied to any search algorithm, including brute force search with parallel techniques. For this reason, no comparisons are made between KLEIN and other ciphers, nor with GA, because the same would apply to the others. For this reason, a black box attack is used in the thesis. The goal of the work is to focus on the methods, and they are applied to the search in KLEIN using GA as an example. In an exhaustive search, it is guaranteed that the sought key can be found if the whole space can be traversed in a reasonable time. Genetic algorithms (GAs) are often more efficient than an exhaustive search for several reasons:
Efficient exploration of the search space
Evolutionary mechanism: GAs use principles of natural selection, which allow them to explore the search space more efficiently by focusing on the most promising solutions and combining features of those solutions.
Populations: Instead of evaluating a single solution at a time, GAs work with a population of solutions, allowing multiple regions of the search space to be explored simultaneously.
Eliminating Unpromising Solutions
Natural Selection: Through selection, GAs eliminate less effective solutions and focus on those that perform better. This significantly reduces the number of evaluations required compared to an exhaustive search, which evaluates all possible solutions.
Genetic Operators.
Fast Convergence.
Adaptability.
Reduced Computational Complexity.
Genetic algorithms use an adaptive and evolutionary approach that allows for smarter and faster exploration of the search space, eliminating the need to evaluate each possible solution individually. This makes them a powerful tool for solving complex problems where the solution space is large and difficult to manage.
Proposed key space partition methodologies
Next, as an introduction to the proposed key space partition methodologies, an application of the TBB and BBM methodologies is demonstrated for attacking block ciphers when some bits of the key are known. The basic idea is to fill in some of the remaining bits. Note that the selection of () and the class in which the search is performed is equivalent to setting the final bits of the key in the TBB methodology and the initial bits in the BBM. In that context, if the first or last l bits of the key were known (or wanted to be set as known, which is common in the context of cryptanalysis), then the partition would be created by implementing or . However, the most frequent issue is that some non-consecutive bits are identified, while others are missing to complete the block. In this case, the first (or last) l bits should be filled in by finding all combinations of the components that are unknown in that sub-block of length l.
For example, suppose that for a certain key,
K, the first 19 bits are known (the left-most significant bit is taken),
:
with
. So, to find the remaining bits, and therefore, the complete key, the partition can be made by taking
, and the chosen class is the conversion to decimal of the sub-block formed by the first 19 bits,
Following the idea of completing bits, the size of the class in which the search is performed could be reduced by completing one or more of the following bits. In this case, since the bits can only take two values, and therefore, the next component,
, can only be 1 or 0, we could take
and search in two classes (with fewer elements):
Note that in these cases, key space partition methodologies are advantageous because they allow parallel searching of several classes at the same time. In the case where some bits are known discontinuously, one can proceed in a similar way. Suppose the key has the following structure:
where it can be seen that 17 bits of the total 64 are known. The largest number of known bits are in the first part of the block, so the BBM methodology must be used again and, therefore, partitioned by choosing
, which is the range between the first and last known bit. In this range, the bits
,
, and
are missing. Filling in these bits, there are
classes in which to search for the key. It is clear that the key is in one of these classes. Note that the above is true for both methods. The examples were performed by setting the first few bits. In the case of the last bits, the TBB method would be used in a similar way, where the class would be chosen as
equal to the decimal conversion of the last sub-block of the key.
Combined Methodology of Key Space Partition (CoMeKSPar)
With the information provided in the previous section, one can appreciate the usefulness of a tool for cryptanalysis when certain bits of the key are known or need to be fixed. The proposal reflects the complementary nature of the BBM and TBB methodologies. However, there is a problem with knowing both the start and end bits at the same time. The BBM and TBB methodologies would not solve the situation based on the information revealed so far. By using either of them, knowledge of the first or last bits would be sacrificed. This is because when a portion of the block is fixed, the search is performed on the remaining portion as a whole.
A possible alternative would be to use a fitness function that takes advantage of knowledge of certain components of the key. However, this approach would have two challenges. First, it would limit the number of fitness functions the GA could use within the same partition. On the other hand, the fitness function takes into account the known bits, but the search performed by the GA is blind to this information. In other words, it is as if no bit of the part of the key that was not set during the partition (where the GA searches) is known.
With regard to this problem, a key space partitioning methodology is proposed in this section. This methodology allows us to fix the first and the last bits of the key simultaneously. In this way, the search is performed in the remaining central bits, which is the main benefit of this proposal. This is a combination of the BBM and TBB methodologies, which is referred to as CoMeKSPar (Combined Methodology of Key Space Partition).
The general idea is to first apply the TBB methodology to the entire key space and then apply the BBM methodology to the subset
Q of the TBB over which
moves. When we use the TBB methodology, the elements of the space
are obtained by the expression
where
and
; in particular, note that
. Now, in this methodology,
is set to choose the class, which is equivalent to setting the last
components, and then
varies by
Q to move through the elements of said class.
The next thing is to apply the BBM methodology on the set
Q as if it were the entire space. Let
,
. Dividing
by
, the set
Q is divided into
subsets, with
elements each. Taking
then, an element
is expressed as
Note that q and r are the same parameters of the BBM methodology, only the space has been reduced to Q. With q, the subinterval (or class) is fixed, and with r, the position within it. Now the search is performed in the set , where r is free. Note that by choosing q, the first bits of the key are being set to .
Now, to recover the complete key in
, we substitute (
11) in (
9), from which we obtain
where
,
,
q, and
are fixed, and only
r varies by
. With the above, it is guaranteed to be able to fix the last
and the first
bits of the key. It is interesting to note that it is equivalent to applying the BBM methodology first and then the TBB. However, it does not make sense to use the same methodology twice, since it would be equivalent to using it only once.
General Algorithm of Key Space Reduction (GAKSRed)
The most general case is when the known bits of the key are distributed over the block of length . This includes that some components are continuous. This section proposes a key space partitioning methodology that allows searching with the GA in the space created by the unknown bits of the key, regardless of their distribution in the binary block. This methodology is referred to as GAKSRed (General Algorithm of Key Space Reduction).
Let
be the space in keys of length
. Let
be a key. From now on,
denotes the value of the component that occupies the position
in the binary block. Suppose that from
B, the following
l bits distributed randomly throughout the block are known:
with
,
and
. Let us denote as
the list formed by these components, in that order,
and as
, to the list of the indexes:
The components are always taken in ascending order in relation to their position in the block, i.e., if and only if On the other hand, the first position is occupied by the leftmost bit of B, and the last bit is the rightmost bit of B.
By fixing the
l bits of (
13), the entire key is recovered if all combinations of the remaining
bits that are not known are found and evaluated. This would give a total of
possible combinations. They would go from making all (unknown) bits equal to 0 to all equal to 1. Therefore, in decimal base, it would be equivalent to searching from 0 to
. The only problem would be that these unknown components are not continuous, but are scattered throughout
B forming several sub-blocks, for this reason, it is important to pay attention to the position occupied by the bits in the block.
Let
be the unknown bits of
B. Where
,
, and
. As explained above, a sub-block of
components is formed by concatenating these bits. Let us denote as
the list formed by these components, in that order,
and as
, to the list of the indexes:
Now the space in which the search is performed is obtained by calculating the combinations of this sub-block. This is equivalent to searching the space .
Note that is isomorphic to ; however, the notation change is due to the different way of obtaining the set . In this sense, what has been carried out so far is (1) separating from B the unknown bits, saving their positions in the block; (2) concatenating these components; (3) and performing the search, using the GA (or another algorithm), in the set . In other words, with , we are referring to all the possible combinations of the unknown components of B. The elements of represent the decimal conversion of blocks of length . We select a class by fixing the l known bits of B, and the search space is reduced to .
To retrieve B from and , the function is suggested. This function creates a binary block of length in which it places the components of and , taking into account the place (indexed by and , respectively) corresponding to each bit. In the programming of this methodology, it is possible to have, in the pre-calculation phase, with the components of in their position; therefore, in , it would not be necessary to create a new variable each time.
A way to obtain an element B could be as follows. The bits of B are traversed sequentially: , . Now, if , then first return the place that i occupies in , its index: , then . Otherwise, if , then , and then .
Now, similar to the previous methodologies, given
, it is necessary to have a way to search for the element that represents
v in
. For this purpose,
v must first be converted to a binary block of length
. Let
V be the binary block of
v:
Each one of the bits of V is inserted in the corresponding components that occupy the positions in B, for they, together with the bits already known (and fixed from the beginning), form the element of that is represented by v in . For this reason, the positions (indexes) of the known bits and the remaining bits must be saved. In other words, the idea is to apply the function taking V as if it were :
The GAKSRed methodology also constitutes a formalization of the procedure of iterative fixing components of the keys used in [
19] to design a method of cryptanalysis of PRESENT-80 using the genetic algorithm, progressively reducing the set of possible solutions.
Let us look at the following example. Suppose
bits of a key
B of a hypothetical length
bits are known:
By separating the known bits and the rest, together with their corresponding indexes, we have
Since
has
components, then the search is performed in the space
. Suppose now that we have
, which in binary would be
Applying the
function, we obtain
where
is the key that represents
V in
. Any variation in
V is also made in
in the components for which no information is known. Looping exhaustively through the set
is equivalent to searching for all possible combinations in
B while keeping the known bits fixed.
Note that this methodology is not isolated from the other three. If the set is still too large, it can be reduced again by applying either of the BBM or TBB methodologies (hence the CoMeKSPar). The above is possible because This feature would enable parallel searches in different classes simultaneously. Note that these methodologies do not influence the size of the space (or subset) where the search is carried out with the GA. In other words, if d bits of a key are not known, the subset where the key is searched is elements regardless of the methodology used. One methodology will be chosen based on the distribution of the unknown bits.
3. Experiments and Discussion
Attack with the BBM and TBB Methodologies
A personal computer (PC) laptop with the following specifications was used for the experiments: Intel(R) Celeron(R) CPU N3050 @1.60 GHz (2 CPUs), ∼1.6 GHz, and 4 GB of RAM memory. The attack is a known plaintext attack and a black box attack. A total of 60 attempts were made to find the key, with 30 attempts for each space partitioning methodology. Of the 30 attempts, 15 were made using each of the fitness functions and . At the same time, out of the 15, 10 were made for and 5 for . The total number of generations the GA has to go through is 163 for and 655 for , and the number of individuals in the population is 100 (the population size) for all cases. In all trials, the classes in which the key was found were searched. The key was found in 38 out of 60 attempts, resulting in a success rate of .
For the BBM methodology with functions and , for , the key was found in 7 and 4 out of 10 attempts, respectively. In both cases, for , the key was found in 4 out of 5 attempts. In total, the BBM methodology found the solution in 19 out of 30 attempts, for of correct answers. On average, it took 87.1 generations and about 28.1 min to find the solution with . The key was found using after 112.1 generations and 35.6 min.
In the case of the TBB methodology, out of 10 attempts for , a positive solution was found in 5 attempts for the function and in 6 attempts for the function . For , the key was found in 5 and 3 tries. As in the first methodology, the solution was found in 19 out of 30 attempts for of correct answers. On average, it took about 90.8 generations and 27.9 min to find the solution with . With , the key was found in about 103.6 generations and 32.4 min. Note that even though more generations were needed than in BBM, the average time was slightly less in this case.
In both methodologies, in cases where the key was not found, the times were as follows. For and , the GA took an average of 47 min and 3.37 h, respectively.
Attack with the CoMeKSPar methodology
The experiments were performed with the same parameters and conditions as before. Out of the 10 attempts for , a positive solution was obtained in 5 and 9 attempts. As for , the key was found in 5 and 4 attempts.
The solution was found in 23 out of 30 attempts for a success rate of . On average, it took about 176.1 generations and 55.26 min to find the solution for . The key for was found after 111.77 generations and 33.97 min. Note that required more generations than the BBM and TBB methodologies and, therefore, more time. The results for were similar. On the other hand, it is worth noting that the effectiveness of CoMeKSPar was greater than the results obtained with BBM and TBB. In cases where the key was not found, the times were similar to those obtained with the BBM and TBB methodologies. Interestingly, all of the methodologies complement each other in terms of utility by fixing some known bits of the key when performing the partition.
Experiments with GAKSRed would give similar results to CoMeKSPar. The difference is that GAKSRed would work with any distribution in the binary block of the known bits of the key, but the behavior in terms of experiments and the size of the search space would be similar to that of CoMeKSPar (depending on the number of bits assumed to be known).
These methodologies are new compared to the previous works, such as [
17,
18]. The choice of encryption is also different; in the case of this article, it is KLEIN, another encryption method with moderate parameters.
These experiments show that it is possible to divide the key space into classes and find the keys in use. At the same time, the results confirm the usefulness of the proposed methodologies. An advantage of these methodologies is that they allow the use of parallel computing. This allows us to search simultaneously in different sub-blocks of bits of the key, which increases the probability of success.