Enhanced Parallel bzip2 Compression with Lock-Free Queue
DOI:
https://doi.org/10.15359/ru.31-2.3Keywords:
Computer programming, Data processing, Programming languagesAbstract
Since the general trend nowadays is to have more and more processors (cores) available in each computer, the scalability of the data structures used in parallel programs must be carefully considered in order to guarantee that they take full advantage of the available processors. Because of increased containment, lock-based data structures usually do not perform proportionally as the number of processors increases. The use of well-designed lock-free data structures, like first in-first out, (fifo) queues, can boost the performance of a parallel program when many processors are available. In this work, a parallel version of bzip2, a popular compression and decompression program, is designed and implemented by using lock-free queues instead of the lock-based ones, and applying a two-buffer-output strategy. The performance of lock-free implementation is measured against lock-based implementations. Compression time was measured with different number of processors and different block sizes. Consistent with our hypothesis, the results show that our parallel lock-free implementation outperforms the other implementations.
References
Boost C++ Library (n.d.). (2013). Retrieved from https://svn.boost.org/trac/boost/
Burrows, M., & Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm. (SRC Research Report 124). California: Digital systems research center. http://www.hpl.hp.com/techreports.
Feldman, S. & Dechev, D. (2015). A wait-free multi-producer multi-consumer ring buffer. ACM SIGAPP Applied Computing Review, 15(3), 59-71. http://dx.doi.org/10.1145/2835260.2835264
Gilchrist, J. (2004). Parallel data compression with bzip2. Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems(PDCS). http://gilchrist.ca/jeff/comp5704/Final_Paper.pdf
Huffman, D.A. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the I.R.E. http://dx.doi.org/10.1109/jrproc.1952.273898
Isakov, K. (2005). Bzip2smp. Retrieved from http://bzip2smp.sourceforge.net
Jannesari, A., Pankratius, V. & Tichy, W. (1990). Parallelizing bzip2-a case study in multicore software engineering. IEEE Software, 26(6). Recovered from https://www.semanticscholar.org
Ladan-mozes, E. and Shavit, N. (2004). An optimistic approach to lock-free fifo queues. Proceedings of the 18th International Symposium on Distributed Computing, Springer, Berlin, Germany. http://dx.doi.org/10.1007/978-3-540-30186-8_9
Marcais, G. & Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27, 764–770. http://dx.doi.org/10.1093/bioinformatics/btr011
Michael, M., & Scott, M. (1996). Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing. http://dx.doi.org/10.1145/248052.248106
Neal, R., Witten, I. & Cleary, J. (1987). Arithmetic coding for data compression. Communications of the ACM, 30(6), 520-540. http://dx.doi.org/10.1145/214762.214771
Seward, J. (2010). bzip2. Retrieved from http://www.bzip.org/
Valois, D. (1994). Implementing lock-free queues. Proceeding of the Seventh International Conference On Parallel and Distributed Computing Systems, 1994.
Welch, T. (1984). A technique for high-performance data compression. IEEE Computer, 17(6). http://dx.doi.org/10.1109/MC.1984.1659158
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin. 1 (6). http://dx.doi.org/10.2307/3001968
Zhang, D. & Dechev, D. (2016). A Lock-Free Priority Queue Design Based on Multi-Dimensional Linked Lists. IEEE Trans. Parallel Distrib. Syst. 27(3) 613-626. http://dx.doi.org/10.1109/tpds.2015.2419651
Ziv, J., & Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5). 530-536. http://dx.doi.org/10.1109/tit.1978.1055934
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
1. Authors guarantee the journal the right to be the first publication of the work as licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors can set separate additional agreements for non-exclusive distribution of the version of the work published in the journal (eg, place it in an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
3. The authors have declared to hold all permissions to use the resources they provided in the paper (images, tables, among others) and assume full responsibility for damages to third parties.
4. The opinions expressed in the paper are the exclusive responsibility of the authors and do not necessarily represent the opinion of the editors or the Universidad Nacional.
Uniciencia Journal and all its productions are under Creative Commons Atribución-NoComercial-SinDerivadas 4.0 Unported.
There is neither fee for access nor Article Processing Charge (APC)