This is a topic for which we receive many questions! It's not a quick and easy topic, so tuck in. BabyBot, over on the right, will be happy to help.

BabyBot knows all about scanning from print or old-style image-only PDFs.
If you've got a book in print, or your former publisher gave you an older-style PDF that's an image only, not searchable, you have pretty much only two options for making an eBook. You can retype the whole book (d'oh!), or you can have it scanned and published.
We get a lot of questions, about scanning, and what happens is that folks get really upset when they find out just what it takes to make a book go from in-print to published-in-eBook state. We're here to try to nail down those steps for you, so that they're not such a big surprise. While it's not as onerous as it appears, below, nor is it a "one-stop-shop" type operation. Beware those scanning companies out there that offer to make your eBook for cheap, because all they're doing is outputting raw scan material to an eBook format—and trust us when we say, that's not what you want your eBook to look like. (If you don't believe us, have them go ahead and make the cheap eBook for you, and open it and take a look.)
The Steps of Scanning a Book/PDF into an eBook are:
1.Scan the print book, which can either be done:
1.More affordably, by breaking the book and ripping off the spine, or,
2.More expensively, by keeping the book and spine intact (usually an additional $0.15/page)
3.The average price for scanning alone in the US ranges from $0.40/page to $1.25 page for plain text; highly-formatted text (like a medical or financial textbook, as an example) will run more. Quite a bit more!
2.Then, the scan has to be run through OCR (Optical Character Recognition), which turns the scan into actual, digitized text. The charge for this varies, but is usually included in whatever price you are quoted in Step #1.
3.The resulting files—should be in Word and in PDF format—have to be proofed.
1.The average price for proofing runs approximately $1/page, or at least $10-$15/hour for 8-10 pages/hour.
4.Once the files are proofed, the author/publisher should review them (if the proofing was sub-contracted out), and approve them, ready-for-eBook-conversion.
5.We then take the finalized Word/PDF file from the OCR'd, scanned book and begin the process of conversion, which runs approximately like this:
1.We export/extract all the HTML from the Word book/Ms/document;
2.We clean up the code, which is always poor quality from scanning services (no fault of theirs);
3.We place the properly-coded text into XHTML, which creates an ePUB, which is one type of eBook format;
4.We send you that ePUB for review, along with Proof sheets;
5.You review the eBook and return it to us with whatever corrections are needed;
6.We then make those corrections, and return the revised ePUB to you for approval;
7.You approve it, and we then,
8.Export that XHTML from within that ePUB, convert it into HTML, "down-code" it to suit the Kindle device, and make a
9.Kindle book, which we send to you for review and approval, and repeat steps 5-7 if necessary, although that's rare.
Where most folks run into trouble is during Step 4. Many author/self-publishers expect that they hand over their book/PDF, and then what emerges at Step 3 is a fully-proofed, clean, ready-for-conversion Word file. Nothing could be further from the truth. While the scanner would/should have cleaned up the file, under Step 3, to remove mistakes that the scanning software identifies, that person's job does not extend to cleaning up the file to a ready state.
By this I mean:
o erroneous page breaks removed (typically, at the bottom of each page;
opilcrows (end-of-paragraph codes; you may have seen them in Word or other word-processing applications, they look like backward Ps) inserted after the end of a line, in the right margin, that is not the end of a paragraph;
orunning headers, left in the file;
oPage numbers, left in the file.
These items are not cleaned up by any scanner. The fee you paid, earlier, for an "edit" is solely to find those things that were missed by the scanner, in the text, or scanned incorrectly (like "hat" for "fiat"). That's not the same thing as then cleaning up the scanned file, so that it's in the same state as a clean, ready-to-go Word file.
But, you'll ask, if not the scanner, who is responsible for the clean-up?
You are. As the publisher, this is your job. Now...we get a lot of feedback from clients that they don't know how, or just don't want to be bothered. That's fine, but having someone else do that job of course costs you more money. When it comes to file clean-up, it is as are most things—your time or your money.
We can run automated clean-up programs, that will find broken paragraphs, incorrect hard-page end codes, and the like, but those automated clips will only find 80-95% of all the scan formatting errata. If you pay us to run the automated program, you'll still have to deal with the mistake clean-up, the remaining unfixed items, when we send you the eBook files for review. Some clients choose to then put those remaining items in their proof forms for us to clean up manually, and we can do that, but again, that costs money. Just as there's no such thing as a free lunch, there's no such thing as free clean-up.
Keep this in mind, when you consider making your backlist books into digital books. It's not undoable, and thousands of other people have done it—but don't be surprised when it turns out to be more work than you expected.