Converting a scanned TIFF document to PDF and creating text searchable PDFs

1 minute read

Published: December 08, 2014

You can follow these steps to create a text searchable PDF document if your scanner only outputs TIFF files. If your scanner creates PDF files but doesn’t perform OCR to make text searchable, skip to the last step.

Convert TIFF to PDF

ImageMagick comes with a command line tool magick to do this.

magick convert scanned.tiff scanned.pdf

Executing this command creates a PDF file from a TIFF created by a scanner.

Optional: Rotate the PDF Pages

Sometimes the scanned pages will need rotating to the correct orientation. Use PDFtk to rotate the pages. Rotating all the pages in the scanned PDF by 90º anti-clockwise is achieved with the following command:

pdftk scanned.pdf cat 1-endwest output rotated.pdf

Individual pages can be selected and rotated as necessary, see PDFtk examples.

Perform Optical Character Recognition

For this step I resort to a copy of Acrobat Pro.

It would have been nice if I had succeeded in achieving good quality output for this step using open source software. Solutions do exist, mainly using Tesseract to do OCR and then forming a new PDF file with a text searchable layer hidden underneath the scanned images. See e.g. Voelkel’s and OCRmyPDF solutions.

However despite reasonable stabs, for various reasons I couldn’t succeed with either. The quality of the OCR output I was getting from Tesseract was lower than Acrobat. Also Acrobat offers the advantage that it performs small rotations to the pages to make sure the text is horizontal. So eventually I gave up on the open source route and now use Acrobat.

Note Acrobat can perform OCR on any PDF file. This is very useful for making old journal articles text searchable if the download offered by the publisher is not.

Edits

24^th August 2018 - Updated ImageMagick command “convert” to “magick convert”

Share on

LinkedIn Twitter

Installing PyNE on Windows via WSL

4 minute read

Published: November 11, 2020

Although PyNE is officially only supported for Linux and Mac, it is possible to build and use it on Windows using the Windows Subsystem for Linux (WSL) and the Ubuntu terminal app from the Microsoft Store. These instructions describe how to install a basic PyNE build (no MOAB, DAGMC or OpenMC interfaces) on Windows via WSL. They presume only some basic knowledge on using Linux and Windows command lines. They were written for Windows 10 (1909), Ubuntu 20.04 and PyNE 0.7.3 - mileage for other releases may vary.

Interactive rebase to switch base branches mid GitHub pull request

2 minute read

Published: August 23, 2020

A situation occurred the other week that I had not encountered before when working on a GitHub pull request for the PyNE Python package. The PR (#1270) added a new function to initialize materials by supplying activities of radionuclides. It complements existing methods to create materials from masses or atom fractions.

Alex Malins