Friday, 24 June 2011

Parameter Substitution

That is a bloody boring title for a blog post. AAAAAnyway, it means that when you are farting around with variables holding filenames in bash scripts, or single commands, you may want to edit the file name. For instance, remember in the PDF compression script we strip the file extension off the filename and add .pdf to create the output filename. Very useful.

This post was prompted by trying to compress a 1000+ page pdf document. The pain was that the pdfimage command only switches to 4 digit file names at 1000. So for 1-999 you get 001 etc instead of 0001. This has the potential to screw everything up when you come to compile your final pdf. You do not want page 1000 to be before page 200, just because it starts with a lower number.

So how do we insert the missing 0?

for file in imageroot-???.*; do mv $file  imageroot-0${file#imageroot-}; echo $file; done

Friday, 17 June 2011

Gnuplot

If you are using the last LTS of Ubuntu, then your Gnuplot version will be out of date. You can't get fancy stuff like transparent surface plots. To download and install the latest version you need to run these commands:

cvs -d:pserver:anonymous@gnuplot.cvs.sourceforge.net:/cvsroot/gnuplot login
cvs -z3 -d:pserver:anonymous@gnuplot.cvs.sourceforge.net:/cvsroot/gnuplot co -P gnuplot
cd gnuplot
./prepare
./configure --with-readline=gnu
make
sudo make install

This will break your Ubuntu packaging system though - so be careful.

Friday, 10 June 2011

Ktikz

If you are doing quite a bit of Latex work with the Tikz graphical librarys then you will probably like to use a wysiwyg program so that you can tweak your wonderful creations. Such a program for Ubuntu is ktikz.

To install it you need to install the latest version of TexLive. If using Ubuntu Natty or later then I think you get the latest version by:

sudo apt-get install texlive

With Maverick or earlier, you need to install the latest version manually to get all the bells and whistles:

wget http://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz
tar -xzvf install-tl-unx.tar.gz 
cd install-tl-20110526/
sudo ./install-tl

The install will then run through - takes about an hour to download the stuff. You can also download the DVD image via torrent, which would be a good way to have a backup for quick reinstall. Once installed, you need to make sure you update your path:

PATH=/usr/local/texlive/2010/bin/i386-linux:$PATH

Installing the ktikz software is a bit easier. You just run the following commands to grab the dependencies in case you do not have them. BTW this is for Lucid only. There isn't a package for Maverick or Natty yet, but you could roll your own.

sudo apt-get install build-essential cmake libqt4-dev qt4-dev-tools libpoppler-qt4-dev kdelibs5-dev pgf preview-latex-style
wget http://www.hackenberger.at/ktikz/ubuntu_lucid/ktikz_0.10-1_i386.deb
sudo dkpg -i ./ktikz_0.10-1_i386.deb


You may need to preface some of the scripts with libraries to load:

\usepackage{tikz}
\usetikzlibrary{calc}
\usetikzlibrary{intersections}
\usetikzlibrary{through}

Friday, 3 June 2011

Compressing PDF files

If you have got a hold of a PDF file which comprises lots and lots of images and nothing else it may well be huge if the images are not compressed. You can fix this at the command line in ubuntu. You do this as follows:

You will need:

sudo apt-get install pdftk imagemagik

First you need to unpack the images from the PDF. Start this from a blank directory because we are going to automatically do things to all the files in this directory with a specific name.

pdfimages /path/to/filename.pdf imageroot

You get lots of:

imageroot-[three digit number].somethings

These somethings are either a ppm or a pbm filetype. These are very, very, basic graphic image dumps - like a bmp image. One is for texty stuff, and the other is for imagy stuff. I therefore use these filetypes as wildcards in the next command, but you would need to replace these with the correct image types that are generated by this command if your results are different. Such as if you choose to try and output jpeg files by using the [-j] option.

Next you compress each image to a pdf, one page long. This will not work properly if you did not start with a blank directory because we are going to command changes to be made to EVERY file in this directory with the extensions produced by the last command.

For colour sources, you need jpg compression:

for file in *.{ppm,pbm}; do convert -compress jpeg -quality 50 $file  ${file%.???}.pdf; echo $file; done

That applies the commands between the [do] and the [done] bits to all [file]s with the extension of either [{,}] [ppm] or [pbm]. The commands in the middle [convert] the image files into pdf files using [jpeg] [compress]ion with [quality] [50]%. You can obviously change the quality percentage to get the best results depending on your source material. The result is sent to a file whose name is constructed from the input [file] variable [$] less whatever three letter [???] extension [.] it has, plus the characters [.pdf]. The next bit just prints out the last filename processed so you can make sure it is doing something if you are processing LOTS of files.

For black and white sources you need fax compression:

for file in *.{ppm,pbm}; do convert -alpha off -monochrome -compress Group4 -quality 100 $file  ${file%.???}.pdf; echo $file; done

This is basically the same approach as last time, just with a change to the [convert] command. This time the [jpeg] stuff is gone, and we have the [-alpha off -monochrome -compress Group4 -quality 100] bit instead. I can't get the quality setting to do anything here. The Group4 refers to the particular brand of fax compression which is applied.

Finally we take all of those individual pdfs and we combine them into one big one. Again, this will not work properly if you did not start with a blank directory. This copies EVERY pdf which matches the search string (the imageroot*.pdf bit where * means anything) into the final pdf. The classic error here would be making your image root name too similar to your original pdf name, with the result that the built pdf incorporates the original - hardly reducing the file size!

pdftk imageroot*.pdf cat output name_of_final_file.pdf

That uses the [pdf] [t]ool[k]it program to take every [*] file that starts with [imageroot] and ends with [.pdf] and con[cat]enates them into the [output] file named [name_of_final_file.pdf].

Simples.