About

About

Researcher working on radiation, environmental and health physics in Japan Atomic Energy Agency's Center for Computational Science and e-Systems.

Spotted an error on this site?
Corrections appreciated, please email alex (at) alexmalins.com or leave a comment.

Remove pesky PDF title pages and other PDF tricks

PDFtk

PDFtk command to remove a title page

Taking out pages from PDF files, binding PDFs together, and removing PDF security. Here are a couple of free programmes to perform these tasks.

Taking out pages from a PDF

Sometimes PDFs contain a title page or blank pages that you might want to remove. PDFtk Server is a useful command line tool that can take pages out from a PDF file. To remove the title page from a PDF file, run the following command:

pdftk input.pdf cat 2-end output output.pdf

Set the pages you want to maintain within the document after the cat command, and change input.pdf and output.pdf to suit.

Merging PDF files

You may wish to combine a journal article with its supplementary information, or with comment letters and author responses. Again PDFtk can do this. To combine two pdf files into one:

pdftk input1.pdf input2.pdf cat output combined.pdf

PDFtk can also perform a host of other useful tasks – see its examples page.

Removing PDF Security

Sometimes PDF files come with encryption settings that prevent you from doing the above, or from performing OCR on the document. I find that QPDF works well for removing the security settings, even if you do not know the PDF password. To remove the security settings from a file run:

qpdf --decrypt input.pdf output.pdf

However QPDF does not always succeed. An alternative is to use a PDF printer driver to ‘print’ an unencrypted PDF. Examples of such software for Windows include CutePDF, Bullzip, PDFCreater etc. Preview on Mac will also ‘Export as PDF’. Ghostscript can also perform this task:

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf -c 
.setpdfwrite -f input.pdf

More Powerful Manipulation of PDFs

The python package pyPdf can be used to perform PDF file operations with the benefit of easy scripting ability. For instance when merging many hundreds of PDF sections of a larger document, each with a filename describing one part of the document (front matter, initial pages with Roman numerals, main pages…), I wrote a python script to merge the files in the correct order.

Edit 9th June 2015

Added ghostscript command to print to a new PDF file.

About Alex Malins

Alex is a researcher working for JAEA in radiation physics and environmental contamination.

Comments ( 3 )

  1. c2 May 2019 11 // Reply

    Is there a way to remove a pattern of pages? My document is a grouping of 8 page documents, and I want to remove page 1,2 and 7 of each 8 page document. So I want to delete 1,2,7,9,10,15,....I suppose I could use excel to assemble those numbers into a string (very long) and pass it to PDFTK commandline by bruteforce.

  2. c2 May 2019 11 // Reply

    In the ideal case, I would be able to write "del mod8 1,2,7" or something.

    • Alex Malins2 May 2019 11 // Reply

      Hi c, I don't think there is an easy way to do that with just PDFtk alone. You could write a script with batch/bash or suchlike to generate the very long string command to pass to PDFtk. This might be easier than excel. Or you could use Python and the PyPDF library to do everything in one go. These suggestions all require a little coding ability, but nothing too difficult.

Leave a reply

Your email address will not be published.