ac

Base-Delta Dynamic Block Length and Optimization on File Compression

Authors

  • Tommy Universitas Harapan Medan
  • Ferdy Riza Universitas Muhammadiyah Sumatera Utara
  • Rosyidah Siregar Universitas Harapan Medan
  • Manovri Yeni Universitas Muhammadiyah Aceh
  • Andi Marwan Elhanafi Universitas Harapan Medan
  • Ruswan Nurmadi Universitas Harapan Medan

DOI:

10.47709/cnahpc.v5i1.1993

Keywords:

File Compression, Delta, Difference, Dynamic Block, Optimization

Dimension Badge



Abstract

Delta compression uses the previous block of bytes to be used as a reference in the compression process for the next blocks. This approach is increasingly ineffective due to the duplication of byte sequences in modern files. Another delta compression model uses the numerical difference approach of the sequence of bytes contained in a file. Storing the difference value will require fewer representation bits than the original value. Base + Delta is a compression model that uses delta which is obtained from the numerical differences in blocks of a fixed size. Developed with the aim of compressing memory blocks, this model uses fixed-sized blocks and does not have a special mechanism when applied to file compression in general. This study proposes a compression model by developing the concept of Base+Delta encoding which aims to be applicable to all file types. Modification and development carried out by adopting a dynamic block size using a sliding window and block header optimization on compressed and uncompressed blocks giving promising test results where almost all file formats tested can be compressed with a ratio that is not too large but consistent for all file formats where the ratio compression for all file formats obtained between 0.04 to 12.3. The developed compression model also produces compression failures in files with high uncompressed blocks where the overhead of additional uncompressed blocks of information causes files to become larger with a negative ratio obtained of -0.39 to -0.48 which is still relatively small and acceptable.

Downloads

Download data is not yet available.
Google Scholar Cite Analysis
Abstract viewed = 127 times

References

Banga, G., Douglis, F., & Rabinovich, M. (1997). Optimistic deltas for WWW latency reduction. In Proc. 1997 USENIX Technical Conference, Anaheim, CA, (pp. 289-303).

Cogo, V., Paulo, J., & Bessani, A. (2020). Genodedup: Similarity-based deduplication and delta-encoding for genome sequencing data. IEEE Transactions on Computers, 70(5), 669-681.

Dolgorsuren, B., Khan, K., Rasel, M., & Lee, Y. (2019). StarZIP: streaming graph compression technique for data archiving. IEEE Access, 7, 38020-38034.

Engelson, V., Fritzson, P., & Fritzson, D. (2000). Lossless compression of high-volume numerical data from simulations. Linkoping: Linkoping University Electronic Press.

Hanumanthaiah, A., Gopinath, A., Arun, C., Hariharan,, B., & Murugan, R. (2019). 2019. In 2019 9th International Symposium on Embedded Computing and System Design (ISED) (pp. 1-5). IEEE.

Henziger, E., & Carlsson, N. (2019). Delta encoding overhead analysis of cloud storage systems using client-side encryption. In 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), (pp. 183-190).

Housel, B., & Lindquist, D. (1996). WebExpress: A system for optimizing Web browsing in a wireless environment. In Proceedings of the 2nd annual international conference on Mobile computing and networking, (pp. 108-116).

Hunt, J., & MacIlroy, M. (1976). An algorithm for differential file comparison. Murray Hill: Bell Laboratories.

Hunt, J., Vo, K., & Tichy, W. (1998). Delta algorithms: An empirical analysis. ACM Transactions on Software Engineering and Methodology (TOSEM), 7(2), 192-214.

Italiano, G., Prezza, N., Sinaimeri, B., & Venturini, R. (2021). Compressed weighted de Bruijn graphs. 32nd Annual Symposium on Combinatorial Pattern Matching, 191(16), 1-16.

Jahani, E., Cafarella, M., & Ré, C. (2011). Automatic optimization for MapReduce programs. arXiv preprint arXiv:1104.3217.

Korn, D., Macdonald, J., Mogul, J., & Vo, K. (2002). RFC3284: The VCDIFF Generic Differencing and Compression Data Format.

MacDonald, J. (2000). File system support for delta compression. (Doctoral dissertation, Masters thesis. Department of Electrical Engineering and Computer Science, University of California at Berkeley).

Mogul, J., Douglis, F., Feldmann, A., & Krishnamurthy, B. (1997). Potential benefits of delta encoding and data compression for HTTP. In Proceedings of the ACM SIGCOMM'97 conference on Applications, technologies, architectures, and protocols for computer communication, (pp. 181-194).

Mogul, J., Krishnamurthy, B., Douglis, F., Feldmann, A., Goland, Y., van Hoff, A., & Hellerstein, D. (2002). Delta encoding in HTTP. No. rfc3229.

Pekhimenko, G., Guo, C., Jeon, M., Huang, P., & Zhou, L. (2018). {TerseCades}: Efficient Data Compression in Stream Processing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), (pp. 307-320).

Pekhimenko, G., Seshadri, V., Mutlu, O., Gibbons, P., Kozuch, M., & Mowry, T. (2012). Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (pp. 377-388).

Samteladze, N., & Christensen, K. (2012). DELTA: Delta encoding for less traffic for apps. In 37th Annual IEEE Conference on Local Computer Networks (pp. 212-215). IEEE.

Suel, T. (2019). Delta compression techniques. Encyclopedia of Big Data Technologies, 63.

Tan, H., Zhang, Z., Zou, X., Liao, Q., & Xia, W. (2020). Exploring the Potential of Fast Delta Encoding: Marching to a Higher Compression Ratio. In 2020 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 198-208). IEEE.

Trendafilov, D., Memon, N., & Suel, T. (2002). Zdelta: An efficient delta compression tool. Technical report, Department of Computer and Information Science at Polytechnic University.

Vestergaard, R., Zhang, Q., & Lucani, D. (2019). Lossless compression of time series data with generalized deduplication. In 2019 IEEE Global Communications Conference (GLOBECOM) (pp. 1-6). IEEE.

Xia, W., Jiang, H., Feng, D., Tian, L., Fu, M., & Zhou, Y. (2014). Ddelta: A deduplication-inspired fast delta compression approach. Performance Evaluation, 79, 258-272.

Xia, W., Li, C., Jiang, H., Feng, D., Hua, Y., Qin, L., & Zhang, Y. (2015). Edelta: A {Word-Enlarging} Based Fast Delta Compression Approach. In 7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 15).

Zhang, Y., Jiang, H., Shi, M., Wang, C., Jiang, N., & Wu, X. (2021). A High-performance Post-deduplication Delta Compression Scheme for Packed Datasets. In 2021 IEEE 39th International Conference on Computer Design (ICCD) (pp. 464-471). IEEE.

Zhang, Y., Yuan, Y., Feng, D., Wang, C., Wu, X., Yan, L., . . . Wang, S. (2020). Improving restore performance for in-line backup system combining deduplication and delta compression. IEEE Transactions on Parallel and Distributed Systems, 31(10), 2302-2314.

Downloads

ARTICLE Published HISTORY

Submitted Date: 2023-01-12
Accepted Date: 2023-02-03
Published Date: 2023-02-19

How to Cite

Tommy, Riza, F., Siregar, R., Yeni, M., Elhanafi, A. M. ., & Nurmadi, R. (2023). Base-Delta Dynamic Block Length and Optimization on File Compression. Journal of Computer Networks, Architecture and High Performance Computing, 5(1), 229-240. https://doi.org/10.47709/cnahpc.v5i1.1993