Wednesday, July 20, 2011

Still the scanning saga continues. . .

Now I'm looking into scanning service for Obsidian Harvest. And boy, what a long strange trip that is.

So far I haven't found anyone who will scan my document (pages cut from a magazine) and give me a .txt or .doc file of the results. My last hope is Office Max. They won't give me a text file but they will provide a .pdf of the pages. I'm pretty sure I can use Omnipage to OCR the file.

The first problem I ran into is that scanning services either weren't interested or couldn't do it. My lousy 28 pages is just too small for them to mess with. As one of them put it: "Now if you had 2800 pages. . ." The mind boggles. It's not just that my job is small, the pages are physically small -- digest size, which is about half a regular sheet of typing paper. Scanning services are mostly set up to scan letter or legal size pages and running a small page through the scanner runs the risk of jamming the document feeder.

Finally, after several turn downs, I thought of OfficeMax. They could do it for 25 cents a page, which is quite reasonable, and would give me a .pdf of the scanned file for further work.

So in theory this new approach should work. But in theory scanning the story in with the OCR software on my HP printer should have worked as well.

Oh well, at least I'm accumulating a lot of information for my next book.


Rick O said...

Maybe you already addressed this, but have you considered a little Mechanical Turk action? That is, maybe upload the scanned images to a hosting service (such as imgur) and then starting a common shared area (such as a GitHub Gist or a Google Doc) to just let rabid fans do all of the transcription for you?

Rick Cook said...

That's a good idea, except for a couple of things. First, the problem is getting something that's near correct scanned in.

Second, it would be a poor approach with this article because it's full of Aztec names like "heutlacoatl"

Well we'll see how well this approach works.

Jennifer Lu said...

Hey Rick, I've done Magazine and newspaper scans for fashion models and others. Remind me to tell you the story of the 'room of me' sometime. I know how to handle the tricky parts of getting them to scan right because of the background 'noise' inherent in their printing process. Give me a ring cause I could scan, check it, and give it to you in many common formats. It would give me another excuse to read more of your books.
Jennifer Trethewy