Sunday, July 10, 2011

Scanning -- again

Still the scanning saga continues.

I finally got the document scanned and OCRed and started on the corrections. I quickly discovered I'd been two optimistic when I said that one error in 200 was acceptable. It's not really.

In an 18,000 word story that amounts to several thousand errors and some of them are things like substituting an "l" for an "I" which are hard to spot.

What's worse, several sections of the story didn't scan so I've got holes I've got to refill by retyping several pages at a stretch.

Now I am not a proofreader, I'm a writer. I don't have the kind of detail-oriented mind it takes to ferret errors like this and fix them.

And then there's the other problem.

This one I can't blame on the OCR software. It seems that the magazine ended each line with a hard line break. Ducky, except that the story was set two columns to a page so to get the copy prepared I've got to go through and take out all those hard line breaks.

I won't bore you with a long, technical explanation of why that's a problem, except to say that very few word processors allow you to find and replace hard line breaks the way you can other characters. To do it you need to delve into regular expressions -- which are their own brand of magic -- and the regular expressions in LibreOffice are unusually arcane.

What it comes down to is I've got to go through the story and reinsert the paragraph breaks that were left after I got rid of the hard line breaks. By hand.

As a result I've got a headache and I'm still not done with this.

I've come to several conclusions here. The first is that I need better OCR software. FreeOCR just isn't accurate enough for long documents. So I'm getting Omnipage from Nuance, which is $150 but comes highly recommended. The second conclusion is I may need more than that if I'm going to scan in all my stories for my collection "Cooks Book".

I'll try OCRing with Omnipage, but if that doesn't work I'll either have to get a better scanner or farm the job out to a service bureau.

Which reminds me of something John W. Campbell told me once: "Always use the right tool for the job. The right tool to fix a television set is a television repairman."

No comments: