Friday, 22 January 2010

Command Line Compression

In Microsoft centric operating systems compressing files is reasonable straightforwards. The software used is fairly mature, and the task is usually as simple as right clicking on a folder or file and choosing the relevant compression task from the context menu. You then end up with a file containing your compressed data, with a file extension that tells you what the compression type is (.rar, .zip, .7z, etc etc).

Perhaps unsurprisingly by now, Linux is a little more complicated. You will usually encounter a compressed file you want to uncompress before you find the need to start compressing your own data. You will probably download this file. It's name will be something like:

[name of archive].tar.bz2
[name of archive].tar.gz
So what is going on here then? Well in Linux first you have a bunch of files which you 'tar' and then compress. Or, the other way around, first you have a compressed file which you uncompress, then un-'tar' into your bunch of files.

A tar file is, as far as I can gather, a file created using the tar command which contains other files. There is something deeply unsatisfying about that explanation, but it is the best I have. For our present purpose, it is also the only explanation you need. To unpack a tar file you run the following command in the directory where the tar file is:

tar -xf [name of tar file]

The 'x' means e[x]tract, and the 'f' means the [f]ile name follows immediately. That won't work for our files though, because they are not just tarred, they are feathered as well. Sorry, not feathered, compressed. The last section of the extension tells you about the type of compression used. If it says .gz that means it has been zipped. If it says bz2, it has been bzipped with version 2 of that software. Generally speaking bz2 files are smaller than the equivalent gz file, take less time to download and take up less space. Why ALL files are not compressed using bzip2 I have no idea. It is bizarre. Anyway, to uncompress at the same time as unpacking from the 'tar' file, you add 'g' for a gz file and 'b' for a bz2 file. Its obviously isn't it? No, sorry it is 'z' for a gz file (which makes sense … just) and for bz2, 'j' for some unfathomable reason. So:

tar -zxf [name of tar].gz
tar -jxf [name of tar].bz2

You may want to throw in a 'v' for [v]erify into the command which will print the name of each file it extracts. This way you can check the system is actually doing something and has not crashed. The command ends up as:

tar -zxvf [name of tar].gz
tar -jxvf [name of tar].bz2
Now, when it comes to compressing your files, it is just a matter of tweaking the command. You use 'c' for 'c'reate instead of 'x' for e'x'tract. Other than that, after the name of the archive you want to compress into, you list the filenames or directories you want to add. Hence:
tar -zcvf [name of archive].tar.gz [list of files or folders to compress and add]
tar -jcvf [name of archive].tar.bz2 [list of files or folders to compress and add]
All in all, not as painful as it first appears. Having said that, I had to go through a lot of logical deduction to work out all of the above. As I have discovered previously the actual manual page for the tar command is strong stuff. The opening paragraph, for instance, is:

     The first argument to should be a function; either one of the letters
Acdrtux, or one of the long function names. A function letter need not
be prefixed with ``-'', and may be combined with other single-letter
options. A long function name must be prefixed with --. Some options
take a parameter; with the single-letter form these must be given as sep‐
arate arguments. With the long form, they may be given by appending
=value to the option.

Which is clear as mud to some one who just wants to open a compressed file.


  1. mate, you must be the first real person I've read posting about Linux. This blog should be required reading for everyone in development, especially the bastards who wrote the mime-type handling xcomp crap. While I'm at it, here's a shout out to the geniuses who thought visudo was a sensible way to manage that little gem - you're numpties.

  2. I don't know if un-tar ing compressed file really requires some additional flags - i am using tar xf on everything tar -compatibile, compressed or not, and it works. Perhaps this is some new tar improvement (article is a bit old) or was it always like that, but now it work without much problem.

  3. Jarek, you are probably right, and it doesn't need the [j] or [z] flag to tell tar what kind of archive you are feeding it. On the other hand, with Linux once I have gone through the blood sweat and tears necessary to find a sodding command that actually works, I ain't messing with it.