1. Introduction
Solving classification problems is of interest for both practical and theoretical reasons. Already, Felix Klein in his Erlanger Programm (1872) [
1] described mathematics as the study of properties of sets that remain invariant under certain specified groups of transformations. The classification of codes with given parameters and/or properties is one of the main problems in Coding theory. Two equivalent codes are considered to be the same code, and one can replace the other if it is more convenient in the corresponding practical application. For example, we say that there is only one Reed–Muller
binary code, but in fact there are many different codes with these parameters and they are all equivalent to each other. Some algorithms for construction and/or classification of linear codes use the set of all inequivalent codes with smaller parameters and/or given properties. For more theoretical and practical issues of the classification problem we refer to the book of Kaski and Östergård,
Classification Algorithms for Codes and Designs [
2]. The equivalence test is the main part in any classification algorithm.
In this paper, we describe the algorithm implemented in the program
LCequivalence, which can be downloaded from the website [
3]. The program uses as input a file with generator matrices of linear codes over a finite field
with
elements and gives as output a maximum set of inequivalent codes among them. Moreover, it calculates the order of the automorphism group of a codes and presents the orbits of the coordinates with respect to this group.
Although there are many classification results, the known algorithms for equivalence of linear codes that are implemented in a software are the algorithm of J. Leon [
4] (implemented in
Magma [
5] and
GAP [
6]), the algorithm of Thomas Feulner [
7] (implemented in
SageMath [
8]) and Bouyukliev’s algorithm [
9] (implemented in
Q-Extension). An algorithm was proposed by N. Sendrier in [
10], but his algorithm has mostly theoretical value. There are many papers considering the complexity of the code equivalence problem (we distinguish the papers [
11,
12]). Each of the respective programs has its advantages and limitations, and in our opinion it is good to have more choice so that users can test and use the software that is most suitable for their particular research.
Leon’s algorithm checks two codes for equivalence and computes automorphism groups. It is implemented in the computer algebra system Magma, which is not freely available, and to compare more codes, a program must be written for it. The same algorithm is implemented in the free computer algebra system GAP, but only for the binary case, i.e., for linear codes over . Feulner’s program can only be used via the computer algebra system SageMath (as far as we know), so if one wants to use this algorithm one must be familiar with that system. The algorithm implemented in Q-Extension works with codes over fields with up to 5 elements.
As main advantages of the program LCequivalence, we can point out the following: (1) it can be used to find the inequivalent among a huge number of linear codes; (2) it works for codes over prime and composite fields with elements; (3) the limits on the length and dimension of the considered codes depend only on the used hardware and the computational time (not on the algorithm). As far as we know, there is no other program for the equivalence test of a large number of linear codes. The algorithms implemented in popular packages for computations with linear codes, such as Magma and GAP package GUAVA, check pairs of codes for equivalence, and if the researcher needs to test more codes, he/she has to write a program to use this check.
The main idea of the algorithm is to associate each code (regardless of what field it is over) with a binary matrix, so that two codes are equivalent if and only if the corresponding binary matrices are isomorphic. A similar idea was used in [
9], but the problem there is that not every automorphism of the binary matrix used is an automorphism of the code, therefore additional verification is needed.
Algebraically, an equivalence relation can be considered in terms of the action of a finite group
G on a set of objects
. Then, the equivalence classes coincide with the orbits under this action, so two objects are equivalent if and only if they belong to the same orbit of
G on
[
2]. The efficiency of some algorithms depends mostly on the order of the group
G, and if it is too large, the equivalence test becomes a very hard problem.
We use an algorithm that involves obtaining canonical forms for the objects using a canonical representative map. To check whether
we only need to compare their canonical forms. This approach is used in [
9,
13]. In most cases, algorithms of this type are more efficient and faster.
The paper is organized as follows. In
Section 2, we present some important definitions that we need in our research. In
Section 3, we discuss the representation of the linear codes as binary matrices. This representation is very important for the algorithm which is described in the following
Section 4. Some computational results are shown in
Section 5. In the end, before the references, we give a small conclusion.
2. Preliminaries
In this section, we present the most important definitions that we need in our work. By , we denote the n-dimensional vector space over the finite field with q elements. As a metric, we use the Hamming distance between two vectors of , which is defined as the number of coordinates in which they differ. Any k-dimensional linear subspace of is called a linear code of length n and dimension k. The vectors in a code are called codewords, and the minimum among the distances between two different codewords is called the minimum distance of the code. If a linear code over has length n, dimension k and minimum distance d, we say that it is an code. Any matrix of rank k whose rows are codewords from C is called its generator matrix.
Definition 1. We say that two linear q-ary codes and of the same length and dimension are equivalentif the codewords of can be obtained from the codewords of via a finite sequence of transformations of the following types:
- (1)
Permutation of coordinate positions;
- (2)
Multiplication of the elements in a given position by a non-zero element of the field;
- (3)
Application of a field automorphism to the elements in all coordinate positions.
This definition is well motivated as the transformations (1)–(3) preserve the Hamming distance and the linearity (for more details see [
2], Chapter 7.3). It is based on the action of the semilinear isometries group
on the vector space
, where
is the set of all semilinear mappings, i.e., the general semilinear group
is the group of all monomial
matrices over
, and
is the automorphisms group of the field
. Note that the group
is isomorphic to the wreath product
, and any monomial matrix is a product of a permutation matrix
P and a diagonal matrix
, where
,
. Linear
q-ary codes
and
of the same length
n are equivalent whenever
for some
. If
for an element
, then
T is called an automorphism of the code
C. The set of all automorphisms of
C form a group denoted by
.
Any element can be written as where P is a permutation matrix (permutation part), D is a diagonal matrix (diagonal part) and . Note that in the case of prime q, , and if , then , where is the symmetric group of degree n.
To construct all inequivalent codes with given parameters means to have one representative of each equivalence class. To easily make a distinction between the equivalence classes, we use the concept for a canonical representative, selected on the base of some specific conditions.
Let G be a group acting on a set . This action defines an equivalence relation such that the equivalence classes are the G-orbits in .
Definition 2. A canonical representative map for the action of the group G on the set Ω is a function that satisfies the following two properties:
- 1.
For all it holds that ;
- 2.
For all it holds that implies .
For , is called the canonical form of X, and is the canonical representative of its equivalence class with respect to ρ.
In our case, the set consists of linear codes, and we can take for a canonical representative of one equivalence class a code which is more convenient for our purposes.
In addition, we use integer-valued invariants. An invariant over the set (we use the set of integers ) is a mapping such that if a and b are in the same orbit with respect to the action of the group G. We use two different actions:
- (1)
We take to be the set of all linear codes (with additional restrictions if needed, for example, only self-orthogonal codes with these parameters or codes with a given minimum and dual distance, etc.) and .
- (2)
For the second action, we take to be the set of all codewords of a linear code C, and the group . The main invariant that we use in this case is the weight of the codewords.
Finding the canonical form is generally not difficult, but since it is time-consuming, it is important which canonical representation we choose and how we design the algorithm to calculate it. In this work, we do not present a new algorithm for computing canonical forms, but use the algorithm algorithm described in [
9]. If the coordinates are previously partitioned according to suitable invariants, the algorithm works much faster. An invariant of the coordinates of
C is a function
, such that if
i and
j are in the same orbit with respect to
then
,
. The code
C and the invariant
f define a partition
of the coordinate set
N, such that
for
,
, and two coordinates
are in the same subset of
. The description of some very effective invariants and the process of their application are detailed in [
9,
13].
To define pseudoorbits and coloring, we consider a set of invariants for the codewords of C. The automorphism group acts on the code and partitions the codewords into orbits. All codewords with the same value of an invariant f define a set which consists of one or more orbits called a pseudoorbit. The values of f give an ordering of the pseudoorbits and a coloring of the codewords (the codeword has color ). The most natural invariant for a codeword is its weight.
As already mentioned, we associate with each q-ary linear code a binary matrix such that two codes are equivalent if and only if the corresponding binary matrices are isomorphic. Therefore, we need a definition for isomorphism of binary matrices.
Definition 3. Two binary matrices of the same size are isomorphic if the rows of the second one can be obtained from the rows of the first one by a permutation of the columns.
We also use a similar definition for the equivalence of integer matrices.
Definition 4. Two integer matrices A and B of the same size are isomorphic () if the rows of the second one can be obtained from the rows of the first one by a permutation of the columns.
Any permutation of the columns of an integer (or binary) matrix A which maps the rows of A into the rows of the same matrix is called an automorphism of A. The set of all automorphisms of A is a subgroup of the symmetric group , denoted by .
3. Representing the Objects
To check codes for equivalence, we need (1) to find a proper set of codewords, (2) to set a binary matrix corresponding to (if the code is not binary) and (3) to compute the canonical form of . The set of codewords of the code C must have the following properties:
generates the code C;
is stable with respect to ;
If and then , .
We begin with an algorithm for finding a proper set :
Initialization: ;
generate the set D of all codewords with smallest not considered weight;
Find and order pseoudoorbits of D according to the value of the applied invariant;
For r from 1 to l do the following:
If , then ;
If , then go to point 2 else return.
Since the set
is stable under the action of the automorphism group of the code, together with each vector it contains all its proportional vectors. Let
be a
matrix whose rows are the codewords from the set
. For any column
v of this matrix we add all its proportional column vectors and thus construct the matrix
with the same number of
rows but
columns. If the codes
and
are equivalent then there is a monomial matrix
and an automorphism of the field
such that
for any codeword
. It turns out that
, where
P is a permutation matrix that permutes the rows of
. Since in the definition for equivalence of integer matrices we use only permutations (resp. permutation matrices), we consider the matrices
and
. If we map the monomial matrix
M to the permutation
matrix
by replacing any element with a
matrix in the following way:
we have
. This gives us that
This means that instead of the matrices and , we can use their transpose matrices, which is important if .
Example 1. The ternary codes and are equivalent and the monomial matrix maps the codewords of to codewords of . We take We take only one representative of each class of proportional vectors and thus obtain the set . To define the binary matrix that corresponds to the set , we consider the cases and separately.
Consider first the binary case. Let be a binary matrix whose rows are the codewords from .
Theorem 1. The binary codes and are equivalent if and only if the binary matrices and are isomorphic.
Proof. Let the binary codes and be equivalent. Hence, there is a permutation such that . It follows that , therefore the matrices and are isomorphic.
Let . Then, there is a permutation such that the matrices and have the same rows but are ordered differently. Hence, maps the codewords from to codewords in , therefore the codes and are equivalent. □
Representing a linear code over a field with more than two elements as a binary matrix is already introduced in [
9,
14], but here we present a different approach which is more general and more suitable for codes over fields with
elements.
Let
C be a
q-ary linear code for with
, where
p is a prime, and let
be a primitive element of
. We map each element of
to a circular binary matrix of order
in the same way as shown in (
1).
Consider the matrix
whose rows are the codewords from
for the
q-ary linear code
C of length
n. Replacing all elements in
using (
1), we obtain the matrix
of size
where
. Note that this presentation includes also the proportional vectors of the codewords in
and
.
To construct the needed binary matrix, we use some additional matrices that depend on the field. Denote the
circulant
by
L and let
The automorphism group of this matrix is , where is the presented cycle of length , and , where corresponds to the i-th ciclotomic class modulo . This group is isomorphic to which is the automorphism group of the trivial code. We prove this isomorphism in detail in the following lemma.
Lemma 1. .
Proof. Let us label the rows and the columns of the matrix
from 0 to
. We first consider the case of a prime
. Then the
i-th row of
is
and
if
. The
j-th column is
if
, and
Now, take a permutation . If , then the column with number 0 goes to the j-th column. This column contains only one coordinate equal to 1 and this 1 is on the j-th row which contains only one coordinate equals to 2, namely the -th coordinate. Hence, . Considering the -th column and the positions of the coordinates equal to 1 and 2, we have . Hence, and .
Let us now consider the general case when
for a prime
p and integer
. In this case, if
,
, then
for all
, and
if
,
. We can consider any automorphism
as a permutation of the nonzero elements of the field. If
, the row with number 0 goes to the
i-th row. This means that
(these are the positions of the coordinates equal to 2 in these two rows). But then,
and the number 1 row goes to row with number
and therefore
Hence,
. Then,
Thus,
. In this way we obtain that
. But at the same time,
and so
. It follows that
But , , and , thus . Hence, or . It turns out that , , and for which means that , where and is the Frobenius automorphism defined by , . It follows that . □
We give two examples with the matrices .
Example 2. Consider the cases and :
- (1)
If we have , ;
- (2)
If then ,
, .
Next, we replace the elements 0, 1 and 2 in the matrix
with binary column-vectors with two coordinates as follows:
and in this way from the matrix
we construct the matrix
. To have a matrix with an automorphism group which is isomorphic to
, we expand
with the columns of the matrix
, and denote the obtained matrix by
.
Example 3. Consider again the case . Then, Recall that . Let us map to the permutation , and to where is defined as follows: if then , . This map defines an isomorphism between and , and so .
We need two more square matrices, namely the matrix and the matrix . The automorphism groups of these two matrices are isomorphic to .
Now expand the matrix by adding the rows of the matrix A. In this way we obtain the matrix with rows and columns.
Example 4. Consider the code C with a generator matrix . For this code The automorphism group of the matrix A is isomorphic to and therefore and consists of those automorphisms of A which belong to .
Example 5. Consider the code from Example 4. If we swap the two coordinates and then multiply the second coordinate by 4, we obtain the same code. Hence, the monomial matrix belongs to the automorphism group of this code. We map this monomial matrix to the permutation matrix It is easy to see that the matrix has the same rows as the matrix and so .
The matrices and are related to the matrix which we construct in the following way: (1) we glue the zero matrix to the matrix ; (2) we add the rows of B. The obtained matrix has rows and columns.
Example 6. Consider the same code C with a generator matrix . For this code Theorem 2. The q-ary codes and are equivalent if and only if the binary matrices and are isomorphic.
Proof. In converting the matrix to the matrix , we replace each column with a group of columns and then extend the obtained matrix with extra zero columns and the rows of the matrix B. Therefore, there is a one-to-one correspondence between the permutations of the coordinates of C and the permutations of the corresponding groups of columns in . More precisely, if is a permutation of the n coordinates of C, it goes to the permutation of the columns of such that if then and for , . Moreover, the multiplication of a coordinate by can be considered as applying the permutation on the corresponding group of columns i times. The automorphism of the field applied to all coordinates of all codewords also can be considered as a permutation of the columns of .
The above observation shows that if maps the code to , then the corresponding to permutation of the columns of maps the rows of this matrix to the rows of . Note that can be considered as a permutation followed by multiplications of the coordinates of the code by nonzero elements of the field and eventually applying an automorphism of the field to all coordinates of all codewords.
Conversely, if we take a permutation of the columns of that maps the rows of this matrix to the rows of , then and maps the rows of to the rows of . □
Corollary 1. .
Example 7. The automorphism group of the code is According to Corollary 1, . Indeed, Calculating , , , where is the matrix without the last columns, we see that the extra rows and columns are important to remove the automorphisms of the matrices that do not correspond to automorphisms of the code.
4. The Algorithm
The classification problem we want to solve is the following: given a set of linear q-ary codes of length n and dimension k, take only one representative of each equivalence class. We present the algorithm that gives a solution of the problem. As an input, we have a family of linear codes presented by a generator matrix. The set of these matrices is denoted by W. As an output, we have one representative of each equivalence class with respect to the defined (in the previous sections) equivalence relation in the set of linear codes. Generator matrices of these representatives compose the set U. The algorithm follows these steps:
We find a generating set for each code C. In most cases this is the set of codewords with minimum weight. If this set has a rank smaller than k, we add the codewords with another weight, etc. In some cases we also use other invariants to get a smaller and more convenient generating set.
We take the binary matrix as it is described in the previous section. According to Theorem 2, two codes and are equivalent if and only if .
We find the canonical form of the matrix
as it is described in [
9].
We compare the canonical forms of the matrices. If two matrices have the same canonical form, they are isomorphic and the corresponding codes are equivalent. If these matrices have different canonical forms, the codes are not equivalent.
A pseudocode of Algorithm 1 is given below.
Algorithm 1 Procedure Find_Inequivalent_Codes |
- Input:
A set W of generator matrices of linear codes. - Output:
A maximal set of generator matrices of inequivalent codes. -
var -
HA: array of integers; -
Hf: file; -
not_found: boolean; - 1:
Initialize Hf; - 2:
for all ; - 3:
for each generator matrix do - 4:
Find generating set of the corresponding linear code C; - 5:
construct the corresponding binary matrix ; - 6:
find the canonical form A the matrix and its hash value ; - 7:
call the procedure SEARCH_IN_HASH_TABLE with parameters , A, , {the procedure gives the value of the boolean variable which is true or false} - 8:
if then - 9:
- 10:
end if - 11:
end for
|
For fast search between a large number of similar objects (in our case, binary matrices in canonical form), an abstract data structure known as a hash map is used (more information about hash tables and hash maps can be found in [
15,
16]). It consists of three components: a hash array, a hash function and a hash table or file. The basic idea is that when we get a new code, we first compare its hash with the hash values of already considered codes. If the new hash is different from all already computed, the new code is not equivalent to either of the previous codes. If the hash is equal to the hash value of another code, then we compare the corresponding binary matrices.
Let us introduce the hashing we use in a little more detail. A hash array can be considered as an array of non-negative integers. To any object, the hash function associates an integer, which can be considered as an index in the hash array . Each element of the array (if ) is a pointer (a positive integer) to the location of the object in a hash file or table. In the beginning, all values in the array are 0s. We define a hash file as an addressable memory of a large size that consists of cells. Objects or records (that contain an object) can be written in these cells. A binary file which has direct access, or a dynamic structure (if enough RAM is available), can serve as the hash file (hash table).
We use a non-cryptographic hash function h that maps an object of arbitrary size to a positive integer not larger than the size of the hash array. If the number of different matrices we examine is greater than the size of the hash array, there will be at least two matrices with the same hash value, which is called a collision. The hash function h must meet the following requirements:
The function should be easily and quickly calculated.
It should provide a uniform distribution in the hash array. In this case, the number of collisions will be smaller. Even with a very good hash function, collisions are not excluded. The collision handling technique which we use is known as Separate Chaining (Open Hashing). It is usually implemented using linked lists. A nonzero cell of the hash array points to the first object in a linked list of objects that have the same hash value (if no matches with the same hash value are found, the list consists of one element).
An implementation of this approach is presented in Algorithm 2.
The total complexity of the main algorithm depends on the size of the set of codes (the set
W consists of their generator matrices) but is difficult to determine because it depends on many subalgorithms. For codes with large dimension, one of the most expensive parts is the generating a proper set
of codewords. In this case, a more efficient approach than the standard brute force algorithm (with Gray code) gives the Brouwer–Zimmermann algorithm and its modifications [
17]. The construction of the binary matrix
, which corresponds to the set
, despite its theoretical complexity, practically does not affect the execution time of the algorithm. Finding the canonical form of a binary matrix represents the solution to the graph isomorphism problem, since in this case a bipartite graph is considered. The computation time required depends on whether the binary matrix corresponds to a regular structure such as a combinatorial design and does not particularly depend on the order of the automorphism group. The use of a canonical form reduces the isomorphism problem between binary matrices to a matrix comparison problem which has complexity fixed by the matrix parameters and negligible execution time. In the case of a large number
m of non-equivalent matrices, if a hash map is not used, the complexity of comparing a matrix with the others becomes linear with respect to
m, and when a hash map is used, the complexity is reduced to a constant.
Algorithm 2 Procedure SEARCH_IN_HASH_TABLE |
- Input:
, the matrix A, the integer array , the file - Output:
: boolean; -
type -
structure MATREC { -
matrix M; -
integer } // next matrix with the same hashkey -
var -
, of type MATREC; -
, : integer; - 1:
; - 2:
; - 3:
if then - 4:
append to the file ; - 5:
the position of in the file ; - 6:
return - 7:
else - 8:
; - 9:
while do - 10:
; - 11:
read from the structure {from the file }; - 12:
- 13:
if then - 14:
return - 15:
end if - 16:
end while - 17:
append to the file ; - 18:
the first position of in the file ; - 19:
; - 20:
write from the structure to the file - 21:
return ; - 22:
end if
|
6. Conclusions
This paper is devoted to the algorithm for equivalence of linear codes over finite fields implemented in the program
LCequivalence, which is a module of the software package
QextNewEdition v1.1 [
3]. The program takes as input a file with generator matrices of linear codes and gives as output a file with generator matrices for the maximum set of inequivalent among the input codes. Using this program, one can test for equivalence codes over fields with
elements. As can be seen in
Table 1, the file may contain large number of codes. As far as we know, there is no other program for the equivalence test of a large number of linear codes. The algorithms implemented in the popular packages that deal with linear codes check pairs of codes for equivalence.
Instead of linear codes, the program
LCequivalence can test binary matrices for equivalence. This means that it can be used for equivalence (isomophism) testing of objects that can be represented by binary matrices including graphs [
14].
The program also gives the orders and generators of the automorphism groups of the corresponding binary matrices. Optionally, by marking the corresponding option, the program can also write additional information to the output file, such as a permutation that leads one matrix to its equivalent, orbits, weight distributions of the codes, etc.
No installation of the program is required. It is only necessary for the user to create a directory and download a version of the program that corresponds to the operating system used—Linux or Windows. The interface is very simple. The user can choose from the following seven options:
Find inequivalent codes.
Find inequivalent matrices.
What to print in the output file about codes?
What to print in the output file about matrices?
Change the name of the input file.
Random codes (will be written in a file with name EXAM in the directory RES_DIR0).
Random quasi-cyclic codes (will be written in a file with name EXAM in the directory RES_DIR0).
As an input the program LCequivalence uses a text file with generator matrices of linear codes in the form described in the previous version of Q-Extension. There is no limit to the number of codes in the input file.