LOSSLESS BINARY DATA COMPRESSION METHODS FOR INCREASING THE PRODUCTIVITY OF SOFTWARE SYSTEMS
DOI:
https://doi.org/10.32782/IT/2025-1-22Keywords:
compression, algorithm, entropy, software system, C#, ASP.NET.Abstract
The article describes methods used for data compression without considering formats or types. Data compression is the process of encoding data to reduce its size; lossless data compression means that the reverse decoding process restores the data in its original form. There are limitations to lossless data compression that depend on the information entropy of the message: the lower it is, the greater the potential compression ratio of this data. Data with high entropy, for example, random or previously compressed with a sufficiently optimal encoding, cannot be compressed. The work aims to investigate using data compression algorithms with different types of information, formats or information entropy. Lossless data compression algorithms are divided into subcategories, in particular, dictionary and entropy, which differ in the principle of operation. Dictionary and entropy methods can also be combined to increase compression efficiency. The scientific novelty lies in finding patterns between the algorithms used and the data compression ratios of specific formats. For the first time, data on many different data compression algorithms, both independent and those consisting of others, were processed and systematized. As a result, data were obtained on compression methods best suited for compressing images, Internet pages, source code, and other widely used formats. The research methodology is based on measuring several characteristics of the original and compressed files and the operation of algorithms with subsequent comparison of this data. Therefore, files can be distinguished by format and information entropy before compression. After compression, a compression ratio can be found that characterizes the efficiency of the algorithms. The study involves universal algorithms that perceive information as a specific sequence of bytes. Thus, they can be applied to various file formats, including those most often used in distributed data storage systems. The conclusion contains practical recommendations for the application of data compression algorithms. The data obtained during the study can be used for integration into other software products or further analysis.
References
Shannon C. E. A mathematical theory of communication. Bell System Technical Journal. 1948. Vol. 27, no. 4. P. 623–656. https://doi.org/10.1002/j.1538-7305.1948.tb00917.x (date of access: 19.03.2025).
McKay D. J. C. Information theory, inference & learning algorithms. Cambridge, UK : Cambridge University Press, 2003. 640 p.
State-of-the-Art Trends in Data Compression: COMPROMISE Case Study / D. Podgorelec et al. Entropy. 2024. Vol. 26, no. 12. P. 1032. https://doi.org/10.3390/e26121032 (date of access: 20.03.2025).
Mohideen R. M. K., Peter P., Weickert J. A systematic evaluation of coding strategies for sparse binary images. Signal Processing: Image Communication. 2021. Vol. 99. P. 116424. https://doi.org/10.1016/j.image.2021.116424 (date of access: 22.03.2025).
Collet Y. LZ4 Block Format Description. GitHub. URL: https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md (date of access: 22.03.2025).
Ziv J., Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory. 1977. Vol. 23, no. 3. P. 337–343. https://doi.org/10.1109/tit.1977.1055714 (date of access: 22.03.2025).
Deutsch P. DEFLATE Compressed Data Format Specification version 1.3. RFC Editor, 1996. https://doi.org/10.17487/rfc1951 (date of access: 23.03.2025).
Alakuijala J., Szabadka Z. Brotli compressed data format. RFC Editor, 2016. https://doi.org/10.17487/rfc7932 (дата звернення: 23.03.2025).
Chapter 7: Collecting User Input with Forms. Web Applications with ASP.NET Core Blazor. 2024. P. 129–160. https://doi.org/10.1515/9781501519475-010 (date of access: 25.03.2025).
Jeromel A., Žalik B. An efficient lossy cartoon image compression method. Multimedia Tools and Applications. 2019. Vol. 79, no. 1–2. P. 433–451. https://doi.org/10.1007/s11042-019-08126-7 (date of access: 25.03.2025).