DNA Code Design Based on the Cosets of Codes over
Abstract
:1. Introduction
2. Problem Description
- Hamming Distance Constraint (HD): For some Hamming distance d, denotes the Hamming distance between any two code words.
- GC-Content Constraint: Each code word has the same GC content. GC-content is denoted by w, and it is the number of positions in which the word has coordinate C or G. Generally, .
- Homopolymers Constraint: For a DNA code , the homopolymers constraint means that no two consecutive elements in a DNA code word are identical. For example, is not considered in the set of DNA code words since it has a repeated C.
2.1. Codes over
2.1.1. Simplex Alpha Code
2.1.2. Preparata Code and Kerdock Code
2.1.3. Hadamard Code [14]
2.2. Cosets
3. Approaches
- Algebraic Coding Approach [4]: Fields and rings are used to create DNA codes by mapping the elements of the field and rings to DNA nucleotides.
- −
- Codes over Rings: Over the ring of integers , DNA sequences have been created. The mapping is from to with respect to the codes over . There are two possible mappings from to DNA nucleotides, as shown in [9]. The first is with two non-invertible elements, combining 0 and 2 to form . As a result, 0123 corresponds to . The second mapping has one invertible element and one non-invertible element, i.e., is obtained by pairing 0 with 1. That is, 0123 corresponds to .
- Variable Neighborhood Search Approach: This method uses a variety of local search methods to search the DNA codes [16,17]; one of these methods is:
- −
- Seed Building (SB): In this method, we determined an initial set of code words with certain constraints called seed code words, and then we examined all the possible code words randomly with respect to seed code words [8].
3.1. Our Method
Coset Formation
- Step 1. First, a large number of code words from the universe code with , , , and have been taken using Magma by the following Magma command:V: = UniverseCode(Z4,n). We also obtain the code words of the four codes over using Magma. Now, we use Python to complete the rest of the steps.
- Step 2. Employ the coset formation technique outlined in the preceding section to create 40 cosets from the codes over we indicated earlier. To construct the next coset, we should make sure that our next choice of y does not exist in all the previous cosets. Note that, as Aboluion used 40 cosets in her thesis, we also considered that number of cosets here.
- Step 3. To obtain better bounds, the union of the cosets has been applied. The following describes how the union of cosets has been constructed with the different values of n that we considered in this paper.
- For this length, a union of 20 cosets each from Simplex alpha code and Preparata code has been considered. Note that, although that Preparata code and Kerdock code are defined as , here we are using Magma output for . The total number of cosets in this union is 40. Each coset has four code words, since four is the size of the Simplex alpha code and Preparata code of length four. The code words of these two codes are derived by the Magma commands:and.
- The size of the Kerdock code and Preparata code of length 8 is 256, so each coset of these two codes has 256 code words. At length 8, a union of 20 cosets each from the Kerdock and Preparata codes has been used. Therefore, at the union of this length, the Python program will check 40 cosets that contain 10,240 DNA code words.is the Magma command to get the Kerdock code over .
- The union of 20 cosets each from the Simplex alpha code and the Kerdock code have been involved in the calculations. Simplex alpha code with length 16 has size 16, and each coset of this code has 16 code words. The Kerdock code of length 16 has a size of 1024, so each coset here has 1024 code words. A Python program was run here over 40 cosets with 20,800 DNA code words.
- Hadamard code was used with the Kerdock code at the union of this length. There are 20 cosets of size 128 from the Hadamard code of length 32 given by this command:and 20 cosets of size 4096 from the Kerdock code. Therefore, the union of these codes over has 40 cosets needed to work on them, with 84,480 DNA code words of length 32.
After we obtain the union of the cosets from the two codes over at each value of n, we have the three remaining steps to complete our work on the cosets. - Step 1. After removing the duplicated elements in the union, we map to considering the two previous mappings. Then we apply the no-repetition constraint on the code words.
- Step 2. Apply the SB approach to the union by first choosing two random seed code words that satisfy the given three constraints. These seed code words satisfy HD and GC-content constraints for a given d with and the no-repetition constraint to find lower bounds for . Note that lower bounds depend on the choice of seed code words, and we chose seed code words with better lower bounds.
- Step 3. The program then examines all the DNA code words that exist in the set of the union and keeps those code words that satisfy the given constraints with respect to the seed code words. The program then collects these code words in a new set and gives its size to obtain lower bounds for DNA codes. Note that, for the lower bounds of that we obtained, we should make sure they do not exceed the upper bound of . The upper bound has been taken from Theorem 5 in reference [19].
4. Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Adleman, L.M. Molecular computation of solutions to combinatorial problems. Science 1994, 266, 1021–1024. [Google Scholar] [CrossRef] [PubMed]
- Smith, D.H.; Aboluion, N.; Montemanni, R.; Perkins, S. Linear and nonlinear constructions of DNA codes with Hamming distance d and constant GC-content. Discret. Math. 2011, 311, 1207–1219. [Google Scholar] [CrossRef]
- Brenner, S.; Lerner, R.A. Encoded combinatorial chemistry. Proc. Natl. Acad. Sci. USA 1992, 89, 5381–5383. [Google Scholar] [CrossRef] [PubMed]
- Limbachiya, D.; Rao, B.; Gupta, M.K. The art of DNA strings: Sixteen years of DNA coding theory. arXiv 2016, arXiv:1607.00266. [Google Scholar]
- Benerjee, K.G.; Deb, S.; Gupta, M.K. On conflict free DNA codes. Cryptogr. Commun. 2021, 13, 143–171. [Google Scholar] [CrossRef]
- Dougherty, S.T.; Korban, A.; Sahinkaya, S.; Ustun, D. Construction of DNA codes from composite matrices and a bio-inspired optimization algorithm. IEEE Trans. Inf. Theory 2022, 69, 1588–1603. [Google Scholar] [CrossRef]
- Gaborit, P.; King, O.D. Linear constructions for DNA codes. Theor. Comput. Sci. 2005, 334, 99–113. [Google Scholar] [CrossRef]
- Montemanni, R.; Smith, D.H.; Koul, N. Three metaheuristics for the construction of constant GC-content DNA codes. Lect. Notes Manag. Sci. 2014, 6, 167–175. [Google Scholar]
- Aboluion, N.A. The Construction of DNA Codes Using a Computer Algebra System; University of South Wales: Newport, UK, 2011. [Google Scholar]
- Aboluion, N.; Smith, D.H.; Perkins, S. Linear and nonlinear constructions of DNA codes with Hamming distance d, constant GC-content and a reverse-complement constraint. Discret. Math. 2012, 312, 1062–1075. [Google Scholar] [CrossRef]
- Wan, Z.H. Quaternary Codes; World Scientific: Singapore, 1997; Volume 8. [Google Scholar]
- Gupta, M.K. On Some Linear Codes over Z2s; Indian Institute of Technology: Kanpur, India, 1999. [Google Scholar]
- Hammons, A.R.; Kumar, P.V.; Calderbank, A.R.; Sloane, N.J.A.; Solé, P. The -linearity of Kerdock, Preparata, Goethals, and related codes. IEEE Trans. Inf. Theory 1994, 40, 2. [Google Scholar] [CrossRef]
- Barrolleta, R.D. Partial Permutation Decoding for -Linear Hadamard and Kerdock Codes. Ph.D. Thesis, Universitat Autònoma de Barcelona, Barcelona, Spain, 2016. [Google Scholar]
- Cannon, J.; Bosma, W.; Fieker, C.; Steel, A. Handbook of Magma Functions; The University of Sydney: Sydney, Australia, 2006. [Google Scholar]
- Kawashimo, S.; Ono, H.; Sadakane, K.; Yamashita, M. Dynamic neighborhood searches for thermodynamically designing DNA sequence. In International Workshop on DNA-Based Computers; Springer: Berlin/Heidelberg, Germany, 2007; pp. 130–139. [Google Scholar]
- Hansen, P.; Mladenović, N.; Perez, J.A.M. Variable neighborhood search: Methods and applications. Ann. Oper. Res. 2010, 175, 367–407. [Google Scholar] [CrossRef]
- Beazley, D.M. Python Essential Reference; Sams Publishing: Carmel, IN, USA, 2006. [Google Scholar]
- King, O.D. Bounds for DNA codes with constant GC-content. arXiv 2003, arXiv:0306197. [Google Scholar] [CrossRef] [PubMed]
0000 | 1000 | 2000 | 3000 | 0100 | 1100 |
2100 | 3100 | 0200 | 1200 | 2200 | 3200 |
0300 | 1300 | 2300 | 3300 | 0010 | 1010 |
2010 | 3010 | 0110 | 1110 | 2110 | 3110 |
0210 | 1210 | 2210 | 3210 | 0310 | 1310 |
2310 | 3310 | 0020 | 1020 | 2020 | 3020 |
0120 | 1120 | 2120 | 3120 |
Lower Bound with Homopolymers Constraint | |
---|---|
4 | |
12 | |
3 | |
8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alahmadi, A.N.; Melibari, F.A.; Gupta, M.K.
DNA Code Design Based on the Cosets of Codes over
Alahmadi AN, Melibari FA, Gupta MK.
DNA Code Design Based on the Cosets of Codes over
Alahmadi, Adel N., Fatimah Anas Melibari, and Manish K. Gupta.
2023. "DNA Code Design Based on the Cosets of Codes over
Alahmadi, A. N., Melibari, F. A., & Gupta, M. K.
(2023). DNA Code Design Based on the Cosets of Codes over