How to copy text from images and scanned documents

By techguy | Mar 24, 2009

Globodox - Enterprise Document Management Software

You cannot copy text from images or from scanned documents. You will need to OCR (Optical Character Recognition) images or the scanned documents to copy text from them. For those who do not know, OCR or Optical character recognition, is the recognition of printed or written text characters by a computer.

There are many products out in the market that have OCR capabilities. But if you have Microsoft Office installed on your machine then you do not need purchase or use any free OCR products, as some Microsoft office applications have built-in OCR engine.

You can OCR documents by using Microsoft Office Document Imaging tool and if you have Microsoft Office 2007 installed then you can also use Microsoft Office OneNote.

We will now see how to these applications to OCR images or scanned documents.

Copy text from images or scanned documents using Microsoft Office Document Imaging:
  1. Click on Start > All Programs > Microsoft Office > Microsoft Office Tools > Microsoft Office Document Imaging.
  2. In Microsoft Office Document Imaging, click the File menu and select the Open… option to open the document that you want to OCR. The document will now be displayed.
  3. Click the Tools menu and select Recognize Text using OCR… option. A dialog will be launched.
  4. Select the appropriate option and click OK to begin the OCR.
  5. You can now copy text from this document. If you wish you can export the extracted text to a Word file then click the Tool menu and select the Send Text to Word… option.

By using this method you will not be able to copy text from document scanned as a PDF file. To do this you will need to use Microsoft Office OneNote (available in MS Office 2007)

Copy text from images or scanned documents using Microsoft Office OneNote:
  1. Click on Start > All Programs > Microsoft Office > Microsoft Office OneNote 2007.
  2. In Microsoft Office OneNote, click the Insert menu and select the Files as Printouts… option to open the document that you want to OCR. The document will now be displayed.OcrTextOneNote
  3. Right click the displayed document and select the Copy Text from All the Pages of the Printout option.OcrTextOneNote2
  4. Once the text is copied you can then paste it in Notepad or Word.

That’s It!

Please note that the quality of the extracted text depends on the quality of the scanned document. If the document is not scanned properly then the OCR engine will not be able to recognize the text and give you gibberish text.

Sohodox - Document Management for the Small Office

Leave a Comment

If you would like to make a comment, please fill out the form below.

Name (required)

Email (required)

Website

Comments

© 2007 Tech Guide, - WordPress Themes by DBT