Blog

What is OCR?

OCR (an acronym for Optical Character Recognition) is a technology that converts digital documents (scanned paper documents saved as .pdf files or in .tif format) into editable and searchable data. OCR was originally developed to give visually impaired individuals access to printed information by converting documents into synthetic speech. The technology has been updated and improved and is now used to “read” text from electronic documents. All versions of Primafact include OCR capability.

Unlike a text, email, or word processing file, documents scanned into Primafact are stored as image files (a picture, really) as either .tif or .pdf files. When the image is displayed on the screen, you can read it, but to the computer it is just a series of black and white dots. The computer does not recognize any “words” on the image.

So how does OCR “decipher” the scanned pages and convert them to editable text? OCR software looks at each line of the image and attempts to determine if the black and white dots represent a particular letter or number. Once the dots are translated, Primafact’s OCR program stores the result as plain text within the associated TEXT tab for that page. All the documents you store in Primafact are thus searchable and you can copy and paste text from Primafact into other documents.

As wonderful as OCR is, it is important to understand its capabilities and limitations. While it’s a great tool, it’s not perfect. A scan of a poorly photocopied document may yield extremely poor results. For example, the photocopy may have been on a slight angle, or have light and dark patches, or may be a photocopy of a photocopy or a photocopy of a fax. In all such instances, the OCR program will not be able to effectively read characters and thus translate those characters into logical text.

Handwriting is also problematic. OCR software cannot read handwriting because everyone’s handwriting is unique. For example, a Motor Vehicle Accident (MVA) report form filled in by a police officer will have both printed text and handwriting where the officer has filled in the details. The OCR program will translate the printed text but not the handwritten information. Primafact gets around this limitation by allowing you to annotate the page with manually typed text referencing the handwritten portion(s) of the page. This is also how you would handle handwritten clinical notes and records.

Primafact’s OCR software is an extremely powerful tool that can create vast amounts of text data that is searchable and readily accessible. As long as its limitations are understood, OCR can greatly benefit law firms of all sizes.