Scanned PDF to OCR Text?

DaveS · 26 April 2020 19:51

I have a PDF file containing the SCANNED pages from a 1980 manual… So there is no “text” per se in the file, just “pictures”.

Does anyone know of any way to use some kind of OCR program to create a text file that I can then clean up?

This file is 66meg

cmalumphy · 26 April 2020 20:09

You could try a trial version of PDFPen. I’ve used it to OCR a PDF and then selected and pasted into BBEdit or any other word processor or editor for cleanup.

DaveS · 26 April 2020 20:56

Thanks… I used that then PDF2GO website to extract the text (PDFPen required a paid sub to extract the text)… The OCR isn’t great… so not sure how much work to clean it up will be

cmalumphy · 26 April 2020 21:17

I’ve never found OCR to ever really be good. After using PDFPen to OCR a document, I’ve been able to just select text (dragging the mouse over it or using control-a), copy and paste. My documents are only a few pages in length, however, and cleanup, especially line endings, is quite a chore.

DaveS · 26 April 2020 21:29

this one is 206 pages The original scanned PDF was 60+meg, the OCR one dropped to 27meg and so far the WORD doc version is 260K. a reduction of 230x smaller!

of course the original had hundreds of actual images (screen shots etc)… that the OCR didn’t translate…

SpeedLimitChallenger · 26 April 2020 23:26

Curious if NitroPDF could give you the results you are looking for.

DaveS · 26 April 2020 23:42

Perhaps… but I’m already 100 pages into the cleanup using WORD