Just wondering whether you've considered the DjVu format for your work? DjVu is a free, open-source library designed to efficiently compress scanned documents, and it includes OCR capability. The result is a small file with features such as progressive loading (e.g. over the Web), full text search, and faithful reproduction of both text and images (by virtue of automatically segmenting the document and using different compression techniques on each). See http://www.djvuzone.org/ to check it out.
- nic
