Heckroth Industries

Compression

gzip v bzip2

I have recently been looking at revamping our backup setup and I had to make a decision on the compression method to be used. Should I be using tar with gzip, bzip2 or a combination of the two. They were the only two real contenders mainly due to being stable and supported as standard in tar. The last thing I want to do with backups is to use an exotic compression method, as I want to be sure I will be able to restore the backups.

So the first thing I did on a mixture of servers was to time the length of time it took tar to create the compressed tarball with both tools. The results showed that for our data bzip2 was considerably slower at compressing the tar than gzip. Looking at the size of the final tarballs also showed that bzip2 produced smaller tarballs. So do I want faster generation of the tarballs or smaller resulting tarballs?

The final solution I decided on is to use a mixture of both gzip and bzip2. If it is a small quantity of data then bzip is used as the time difference to produce a small tarball is negligible. For the backup of large sets of data then gzip is used as bzip takes a lot longer to compress it than the time that would be saved pushing the smaller bzip2 tarball across the network to server responsible for writing the backups to tape.

Jason — 2011-01-07