A New Way to Store Simple Text Files
Abstract
:1. Introduction
1.1. Related Work
1.1.1. Compression
1.1.2. Steganography of the Text
1.2. Graphic Format png
2. Method
2.1. Variant 1: Text Encoded in Extended Ascii
- The text file is loaded into the buffer.
- The following are added to its content: ETX, filename with the extension, ETX, where ETX means end of text and is encoded with an ASCII code of 3.
- The image size (height and width) is calculated using the formula
- If the length of the text is less than
- A two-dimensional array is created, with elements consisting of 3-element tuples.
- Text characters are converted to 8-bit numbers (according to Extended ASCII encoding) and divided into 3-fold tuples. The next tuples are saved as the next elements of the two-dimension array T.
- The two-dimensional array of tuples T is treated as a pixel array and saved to the format png. Writing to png is possible using ready–made programming tools, like Python library called Pillow [31].
- (optional) The filename is the abbreviation obtained from the filename using the selected hash algorithm, e.g., SHA3 [32].
- The following pixels of the graphic file are loaded to get values representing the data of the text file until a pixel with an RGB color component equal to 3 is found.
- The RGB components of the loaded pixels are perceived as consecutive ASCII characters and saved in a text file.
- The next pixels are read to get the values of ASCII characters representing the name and file extension, to come across another RGB component value of 3.
2.2. Variant 2: Utf8 Coded Text
- The text file is loaded into the buffer.
- The following are added to its content: ETX, filename with the extension, ETX, where ETX means end of text and is encoded with an ASCII code of 3.
- The image size (height and width) is calculated using the formula
- If the length of the text is less than
- A two-dimensional array is created, with elements consisting of 3-element tuples.
- Text characters are converted to numbers from 0–65,535 (according to UTF-8 encoding).
- Each of the values of is stored in a positional system with a basis of 256 according to the equation:
- The values and are written in tuples of length 3. The next tuples are saved as the next elements of the two-dimension array T.
- The two-dimensional array of tuples T is treated as a pixel array and saved to the format png. Writing to png is possible using ready-made programming tools such as Python library called Pillow [31].
- (optional) The filename is the abbreviation obtained from the filename using the selected hash algorithm, e.g., SHA3 [32].
- The following pixels of the graphic file are loaded in order to get values representing the data of the text file until a pixel with an RGB color component equal to 3 is found.
- From the RGB components of the pixels every two values marked as and are consecutive taken. These values are the coefficients of the number written in a numerical system based on 256, i.e.,
- The values of are perceived as subsequent UTF-8 characters and saved in a text file.
- The next pixels are read to come across another RGB component value of 3, and as in step 2, the next every two values marked as and are used to compute values with (6). The resulting string is the name and extension of the text file.
3. Analysis And Discussions
3.1. Limitations
3.2. Case Study
- data1.txt: a text file containing some English text encoded in extended ASCII
- data2.txt: a text file containing some Polish text encoded in UTF-8
- data3.txt: a text file containing random digits
- data4.json: a text file in the format json containing the code of the .ipynb version of python.py file
- python.py: Python file which content is the source code published on Github platform
- latex.tex: the file containing the latex source code of this article
- A
- Microsoft Windows 10 equipped with an Intel(R) Core(TM) i7-8565U CPU and 16 GB of RAM;
- B
- Linux Ubuntu 18.04.4 LTS equipped with Intel(R) Core(TM) 2 Duo T7200 CPU and 2 GB of RAM;
- C
- MacOS Catalina 10.15.3 equipped with Intel(R) i7-3667U CPU and 8 GB of RAM.
3.3. Compression
3.4. Application in Data Transfer
3.5. Application in Steganography
3.6. Application in Cryptography
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A
References
- Özköse, H.; Arı, E.S.; Gencer, C. Yesterday, Today and Tomorrow of Big Data. Procedia Soc. Behav. Sci. 2015, 195, 1042–1050. [Google Scholar] [CrossRef] [Green Version]
- Bello-Orgaz, G.; Jung, J.J.; Camacho, D. Social big data: Recent achievements and new challenges. Inf. Fusion 2016, 28, 45–59. [Google Scholar] [CrossRef] [PubMed]
- Plageras, A.P.; Psannis, K.E.; Stergiou, C.; Wang, H.; Gupta, B. Efficient IoT-based sensor BIG Data collection—Processing and analysis in smart buildings. Future Gener. Comput. Syst. 2018, 82, 349–357. [Google Scholar] [CrossRef]
- Pottier, R.; Menaud, J. TrustyDrive, a Multi-cloud Storage Service That Protects Your Privacy. In Proceedings of the 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), San Francisco, CA, USA, 27 June–2 July 2016; pp. 937–940. [Google Scholar] [CrossRef]
- ECB Says One of Its Websites Was Hacked, Data Possibly Captured. 2019. Available online: https://news.bloomberglaw.com/banking-law/ecb-says-one-of-its-websites-was-hacked-data-possibly-captured (accessed on 24 March 2020).
- Kapczyński, A.; Banasik, A. Biometric logical access control enhanced by use of steganography over secured transmission channel. In Proceedings of the 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS’2011, Prague, Czech Republic, 15–17 September 2011; Volume 2, pp. 696–699. [Google Scholar] [CrossRef]
- Lawnik, M. Generalized logistic map and its application in chaos based cryptography. J. Phys. Conf. Ser. 2017, 936. [Google Scholar] [CrossRef] [Green Version]
- Lawnik, M.; Kapczyński, A. Application of modified Chebyshev polynomials in asymmetric cryptography. Comput. Sci. 2019, 20, 367–381. [Google Scholar] [CrossRef]
- Chen, J.K.; Lee, W.Z. An Introduction of NoSQL Databases Based on Their Categories and Application Industries. Algorithms 2019, 12, 106. [Google Scholar] [CrossRef] [Green Version]
- Strohbach, M.; Daubert, J.; Ravkin, H.; Lischka, M. Big Data Storage. In New Horizons for a Data-Driven Economy: A Roadmap for Usage and Exploitation of Big Data in Europe; Cavanillas, J.M., Curry, E., Wahlster, W., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 119–141. [Google Scholar] [CrossRef] [Green Version]
- Almansouri, H.T.; Masmoudi, Y. Hadoop Distributed File System for Big data analysis. In Proceedings of the 2019 4th World Conference on Complex Systems (WCCS), Ouarzazate, Morocco, 22–25 April 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Meier, A.; Kaufmann, M. SQL & NoSQL Databases: Models, Languages, Consistency Options and Architectures for Big Data Management; Springer Vieweg: Berlin, Germany, 2019. [Google Scholar]
- Bisong, E. Google BigQuery. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Apress: Berkeley, CA, USA, 2019; pp. 485–517. [Google Scholar]
- Kaur, K.; Sachdeva, M. Performance evaluation of NewSQL databases. In Proceedings of the 2017 International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 19–20 January 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Murazzo, M.; Gómez, P.; Rodríguez, N.; Medel, D. Database NewSQL Performance Evaluation for Big Data in the Public Cloud. In Cloud Computing and Big Data; Naiouf, M., Chichizola, F., Rucci, E., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 110–121. [Google Scholar]
- Siddiqa, A.; Karim, A.; Gani, A. Big data storage technologies: A survey. Front. Inf. Technol. & Electron. Eng. 2017, 18, 1040–1070. [Google Scholar] [CrossRef] [Green Version]
- Salomon, D. Introduction. In Data Compression: The Complete Reference; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–12. [Google Scholar] [CrossRef]
- Portable Network Graphics (PNG) Specification (Second Edition). 2003. Available online: https://www.w3.org/TR/2003/REC-PNG-20031110/#F-Relationship (accessed on 24 March 2020).
- ZIP File Format Specification. 2019. Available online: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT (accessed on 24 March 2020).
- Majumder, A.; Changder, S. A Novel Approach for Text Steganography: Generating Text Summary Using Reflection Symmetry. Procedia Technol. 2013, 10, 112–120. [Google Scholar] [CrossRef]
- Hamdan, A.M.; Hamarsheh, A. AH4S: An algorithm of text in text steganography using the structure of omega network. Secur. Commun. Netw. 2016, 9, 6004–6016. [Google Scholar] [CrossRef]
- Lee, C.F.; Chen, H.L. Lossless Text Steganography in Compression Coding. In Recent Advances in Information Hiding and Applications; Springer: Berlin/Heidelberg, Germany, 2013; pp. 155–179. [Google Scholar] [CrossRef]
- Liu, Y.; Wu, J.; Xin, G. Multi-keywords carrier-free text steganography based on part of speech tagging. In Proceedings of the 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Guilin, China, 29–31 July 2017; pp. 2102–2107. [Google Scholar] [CrossRef]
- Wang, K.; Gao, Q. A Coverless Plain Text Steganography Based on Character Features. IEEE Access 2019, 7, 95665–95676. [Google Scholar] [CrossRef]
- Alsaadi, H.I.; Al-Anni, M.K.; Almuttairi, R.M.; Bayat, O.; Ucan, O.N. Text Steganography in Font Color of MS Excel Sheet. In DATA ’18: Proceedings of the First International Conference on Data Science, E-Learning and Information Systems; ACM: New York, NY, USA, 2018; pp. 101–107. [Google Scholar] [CrossRef]
- Mandal, K.K.; Singh, P.K. Information Hiding in Text Steganography: A Different Approach. In Proceedings of the 2nd International Conference on Advanced Computing and Software Engineering (ICACSE), Sultanpur, India, 8–9 February 2019. [Google Scholar]
- Fateh, M.; Rezvani, M. An email-based high capacity text steganography using repeating characters. Int. J. Comput. Appl. 2018, 1–7. [Google Scholar] [CrossRef]
- Bharti, J.; Solanki, S.; Beliya, A. Comparison of LSB methods and pattern. In Proceedings of the 2017 International Conference on Recent Innovations in Signal processing and Embedded Systems (RISE), Bhopal, India, 27–29 October 2017; pp. 250–256. [Google Scholar] [CrossRef]
- Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef] [Green Version]
- Huffman, D.A. A Method for the Construction of Minimum-Redundancy Codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
- Pillow. 2019. Available online: https://python-pillow.org/ (accessed on 24 March 2020).
- Dworkin, M. SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions; NIST: Gaithersburg, MD, USA, 2015. [Google Scholar] [CrossRef]
- Textract. 2019. Available online: https://textract.readthedocs.io/en/stable/ (accessed on 24 March 2020).
- Kumari, M.; Gupta, S.; Sardana, P. A Survey of Image Encryption Algorithms. 3D Res. 2017, 8, 37. [Google Scholar] [CrossRef]
- Uhl, A.; Pommer, A. Image and Video Encryption. In Image and Video Encryption: From Digital Rights Management to Secured Personal Communication; Springer US: Boston, MA, USA, 2005; pp. 45–134. [Google Scholar] [CrossRef]
- Guan, Z.H.; Huang, F.; Guan, W. Chaos-based image encryption algorithm. Phys. Lett. A 2005, 346, 153–157. [Google Scholar] [CrossRef]
- Yavuz, E.; Yazıcı, R.; Kasapbaşı, M.C.; Yamaç, E. A chaos-based image encryption algorithm with simple logical functions. Comput. Electr. Eng. 2016, 54, 471–483. [Google Scholar] [CrossRef]
- Arab, A.; Rostami, M.J.; Ghavami, B. An image encryption method based on chaos system and AES algorithm. J. Supercomput. 2019, 75, 6663–6682. [Google Scholar] [CrossRef] [Green Version]
- Hua, Z.; Zhou, Y.; Huang, H. Cosine-transform-based chaotic system for image encryption. Inf. Sci. 2019, 480, 403–419. [Google Scholar] [CrossRef]
- Duda, O.; Kochan, V.; Kunanets, N.; Matsiuk, O.; Pasichnyk, V.; Sachenko, A.; Pytlenko, T. Data processing in IoT for smart city systems. In Proceedings of the 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Metz, France, 18–21 September 2019; Volume 1, pp. 96–99. [Google Scholar] [CrossRef]
- Marszałek, Z. Performance tests on merge sort and recursive merge sort for big data processing. Tech. Sci. 2018, 21, 19–35. [Google Scholar] [CrossRef]
- Shatnawi, A.; AlZahouri, Y.; Shehab, M.A.; Jararweh, Y.; Al-Ayyoub, M. Toward a new approach for sorting extremely large data files in the big data era. Clust. Comput. 2019, 22, 819–828. [Google Scholar] [CrossRef]
- Chen, H.; Wan, J.; Li, X. Research and implementation of database high performance sorting algorithm with big data. In Proceedings of the 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), Beijing, China, 10–12 March 2017; pp. 94–99. [Google Scholar] [CrossRef]
- Lawnik, M. Generation of numbers with the distribution close to uniform with the use of chaotic maps. In Proceedings of the 2014 4th International Conference On Simulation And Modeling Methodologies, Technologies And Applications (SIMULTECH), Berlin, Germany, 5–7 September 2014; pp. 451–455. [Google Scholar] [CrossRef]
File | Text File Size (B) | PC | png Size (B) | CR | Avg Time (s) | std Time (s) |
---|---|---|---|---|---|---|
data1.txt | 184,401 | A | 156,621 | 0.8493 | 0.0836 | 0.0014 |
B | 156,630 | 0.8493 | 0.2687 | 0.0052 | ||
C | 156,625 | 0.8493 | 0.1749 | 0.0127 | ||
data3.txt | 1,091,008 | A | 569,894 | 0.5223 | 0.5565 | 0.0083 |
B | 569,898 | 0.5223 | 1.6488 | 0.0183 | ||
C | 569,858 | 0.5223 | 1.6279 | 0.7378 | ||
data4.json | 12,563 | A | 9614 | 0.7652 | 0.0065 | 0.0010 |
B | 9615 | 0.7653 | 0.0196 | 0.0016 | ||
C | 9618 | 0.7655 | 0.0130 | 0.0017 | ||
python.py | 8740 | A | 7197 | 0.8234 | 0.0044 | 0.0001 |
B | 7199 | 0.8236 | 0.0126 | 0.0012 | ||
C | 7197 | 0.8234 | 0.0077 | 0.0009 | ||
latex.tex | 38,153 | A | 32,006 | 0.8388 | 0.0194 | 0.0012 |
B | 31,986 | 0.8383 | 0.0674 | 0.0023 | ||
C | 31,985 | 0.8383 | 0.0467 | 0.0044 |
File | Text File Size (B) | PC | png Size (B) | CR | Avg Time (s) | std Time (s) |
---|---|---|---|---|---|---|
data1.txt | 184,401 | A | 194,749 | 1.0561 | 0.2804 | 0.0047 |
B | 194,761 | 1.0561 | 0.5984 | 0.0088 | ||
C | 194,734 | 1.0560 | 0.3804 | 0.0504 | ||
data2.txt | 252,640 | A | 196,413 | 0.7774 | 0.3885 | 0.0057 |
B | 196,400 | 0.7773 | 0.8292 | 0.0158 | ||
C | 196,394 | 0.7773 | 0.5325 | 0.0696 | ||
data3.txt | 1,091,008 | A | 588,732 | 0.5396 | 1.6277 | 0.1736 |
B | 588,751 | 0.5396 | 3.7014 | 0.0328 | ||
C | 588,741 | 0.5396 | 2.2655 | 0.1097 | ||
data4.json | 12,563 | A | 9990 | 0.7951 | 0.0169 | 0.0024 |
B | 10,008 | 0.7966 | 0.0469 | 0.0031 | ||
C | 10,015 | 0.7971 | 0.0281 | 0.0028 | ||
python.py | 8740 | A | 7162 | 0.8194 | 0.0112 | 0.0007 |
B | 7163 | 0.8195 | 0.0282 | 0.0023 | ||
C | 7159 | 0.8191 | 0.0178 | 0.0015 | ||
latex.tex | 38,153 | A | 30,668 | 0.8038 | 0.0507 | 0.0044 |
B | 30,684 | 0.8042 | 0.1345 | 0.0081 | ||
C | 30,676 | 0.8040 | 0.0786 | 0.0053 |
File | Text File Size (B) | Compression Method | Compressed File Size (B) | CR |
---|---|---|---|---|
data1.txt | 184,401 | zip | 70,990 | 0.3849 |
bz2 | 60,433 | 0.3277 | ||
gz | 74,393 | 0.4034 | ||
data2.txt | 252,640 | zip | 99,364 | 0.3933 |
bz2 | 81,066 | 0.2298 | ||
gz | 103,412 | 0.4093 | ||
data3.txt | 1,091,008 | zip | 479,335 | 0.4393 |
bz2 | 458,533 | 0.4202 | ||
gz | 499,474 | 0.4578 | ||
data4.json | 12,563 | zip | 2567 | 0.2043 |
bz2 | 2518 | 0.2004 | ||
gz | 2512 | 0.1999 | ||
python.py | 8740 | zip | 2202 | 0.2519 |
bz2 | 2221 | 0.2541 | ||
gz | 2128 | 0.2434 | ||
latex.tex | 38,153 | zip | 12,684 | 0.3324 |
bz2 | 12,609 | 0.3304 | ||
gz | 13,184 | 0.3455 |
File | Encoding | Time | A | B | C |
---|---|---|---|---|---|
data1.txt | ASCII | avg | 0.1840 | 0.4015 | 0.2518 |
std | 0.0025 | 0.0052 | 0.0118 | ||
UTF-8 | avg | 0.0718 | 0.1547 | 0.1104 | |
std | 0.0013 | 0.0059 | 0.0068 | ||
data2.txt | ASCII | avg | - | - | - |
std | - | - | - | ||
UTF-8 | avg | 0.0955 | 0.2026 | 0.1375 | |
std | 0.0016 | 0.0058 | 0.0047 | ||
data3.txt | ASCII | avg | 2.4929 | 5.2552 | 3.5691 |
std | 0.0076 | 0.0373 | 0.9982 | ||
UTF-8 | avg | 0.4629 | 0.9612 | 0.6622 | |
std | 0.0076 | 0.0127 | 0.0381 | ||
data4.json | ASCII | avg | 0.0058 | 0.0127 | 0.0085 |
std | 0.0012 | 0.0002 | 0.0021 | ||
UTF-8 | avg | 0.0058 | 0.0115 | 0.0078 | |
std | 0.0001 | 0.0025 | 0.0021 | ||
python.py | ASCII | avg | 0.0039 | 0.0084 | 0.0059 |
std | 0.0009 | 0.0011 | 0.0015 | ||
UTF-8 | avg | 0.0044 | 0.0075 | 0.0055 | |
std | 0.0001 | 0.00147 | 0.0008 | ||
latex.tex | ASCII | avg | 0.0240 | 0.0507 | 0.0319 |
std | 0.0011 | 0.0008 | 0.0026 | ||
UTF-8 | avg | 0.0165 | 0.0332 | 0.0230 | |
std | 0.0007 | 0.0008 | 0.0019 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lawnik, M.; Pełka, A.; Kapczyński, A. A New Way to Store Simple Text Files. Algorithms 2020, 13, 101. https://doi.org/10.3390/a13040101
Lawnik M, Pełka A, Kapczyński A. A New Way to Store Simple Text Files. Algorithms. 2020; 13(4):101. https://doi.org/10.3390/a13040101
Chicago/Turabian StyleLawnik, Marcin, Artur Pełka, and Adrian Kapczyński. 2020. "A New Way to Store Simple Text Files" Algorithms 13, no. 4: 101. https://doi.org/10.3390/a13040101
APA StyleLawnik, M., Pełka, A., & Kapczyński, A. (2020). A New Way to Store Simple Text Files. Algorithms, 13(4), 101. https://doi.org/10.3390/a13040101