makepdfreader: Bind papers into a single file with page numbering and automatic table-of-content generation.

The metadata associated with PDF files is not set very well in practice. However, sometimes you want to bind together many PDF files into a reader or book for your students or your own research or editing practice. For example, you might want to combine the papers for a workshop into a single PDF file without manual tuning still being able to reference each and every page and to quickly look up papers by title or authors.

For this situation, me and a colleague of mine developed the tool makepdfreader which is essentially a single shell script.

If you run it in a directory full of PDF files, it first asks for a title for the reader to be created, then creates a Latex file containing an explicit table of contents including an image file and the inclusion commands for the PDF file. Page numbering is added automatically by Latex. Imagemagick’s convert utility is being used to extract the title sections from the paper to include it into the final PDF.

The core of this script looks like this:

for f in *.pdf; do
	if [ "$f" == "reader.pdf" ]; then continue; fi
	convert "$f"[0] -crop 100%x35%+0+50% "$i.png"
	pngs="$pngs $i.png"
	echo "\ltocstuff{\includegraphics[valign=T,width=175pt,frame]{$i}}\addcontentsline{toc}{section}{}\includepdf[scale=0.95,pages=-,pagecommand={\thispagestyle{readerstyle}}]{$f} \cleardoublepage" >> reader.tex
	let i=i+1

As you can see, it goes through all pdf files in the current directory, creates a PNG from the first page of the PDF using convert and then adds the needed LaTeX command.

You can easily fine-tune aspects such as how you want to crop the title.


You can easily install this script by putting a copy to /usr/local/bin or by downloading and installing the .deb package which does essentially this. And you need to have pdflatex and convert (e.g., ImageMagick) installed on your system.

Some Tips

The order of the files included in the PDF is the alphabetical order of their filenames. And as filenames are fuzzy with bash and Latex and we did not go into the details of escaping them correctly in both worlds, you will want filenames without spaces or special characters.

For this case, I use the following one-liner to rename the PDFs in their alphabetical order before applying the script. Of course, as the files are renamed you should do this only on a copy, not in the original directory

> let i=0; for f in *.pdf; do mv "$f" $(printf "%02d" $i).pdf; let i=i+1; done

You can as well inject other orderings by using ls, cat index.txt or similar constructs instead of *.pdf. For the cat index.txt, you create a file index.txt containing all the filenames in the correct order. Still, filenames with spaces or special characters are not allowed.

Notes for Ubuntu users (and some other Linuxes reporting convert: not authorized errors)

On a default installation of Ubuntu, imagemagick does not anymore convert PDFs for security reasons. In order to be able to use the script, you need to edit /etc/ImageMagick-6/policy.xml and comment out the line disabling PDF support by changing it from

<policy domain="coder" rights="none" pattern="PDF" />

to, for example,

<!--  <policy domain="coder" rights="none" pattern="PDF" /> -->