 — nlp-private:ocr-engine-pros-and-cons [2015/04/23 13:38] (current)ryancha created 2015/04/23 13:38 ryancha created 2015/04/23 13:38 ryancha created Line 1: Line 1: + * Cuneiform + :Pros + :*Free + :*Mostly open source. ​ The rest to be released in the future. + :Cons + :*Currently Windows only, but is being ported to Mac and Linux in the future. ​ Work on this seems to have stalled. + :*Text output only, no searchable PDF + :*A lot of the documentation is in Russian + + * GOCR (JOCR) + :Pros + :*Free + :*Open source + :Cons + :*Text output only + :*Images are converted to PBM, PGM, PPM.  Is this a lossy conversion? + + * OCRAD + :Pros + :*Free + :*Open source + :Cons + :*Text output only + :*PBM, PGM, and PPM images only + + * Expervision (This has a trial version which I have not been able to get yet) + :Pros + :*Exports to searchable PDF + :*Has an SDK for use with C/C++ + :Cons + :*Royalty fees for licensing model + :*Says it is compatible with all operating systems, but demo information says Visual C++ is required? + + *​Microsoft Office Digital Imaging (I have not found a computer with this installed yet)​ + :Pros + :*Comes free with some/all versions of MS office on Windows. ​ It is an optional install, so many computers do not have it installed. + :*I have read that text coordinates can be obtained from MODI. + :*Takes TIFF images + :Cons + :*Windows only + + * ReadSoft + :ReadSoft is geared more toward businesses looking for ways to automate large-scale document processing for managing and organizing data.  OCR is only a small part of what they do.  In talking with them, it doesn'​t seem like they have a product that is very specific to what we are doing. + + *​SimpleOCR (I have not demo-ed yet)​ + :Pros + :*Freeware version including command line version and SDK + :​*Documentation says it can return coordinates of recognized words and images + :*Takes TIFF and other images + :Cons + :*Windows only + :*Does not appear to output PDF + + * PDF OCR X + :Pros + :* Free version and pay ($30)"​Enterprise"​ version. ​ Free version restricts PDFs to one page. + :* Takes TIFF files. + :Cons + :*Only text output + + * NovoDynamics + :Pros + :​*Professional version creates searchable PDFs + :Cons + :*Focused mainly on Middle-Eastern & Asian languages. ​ Works on "​Embedded English."​ + :​*Expensive. ​ Standard version costs$1300. ​ Professional version is "call for pricing."​ + :*No demo available (at least not to me, perhaps if I were a better potential customer?) + + * MoreData/​MoreDataFast (this is based on tesseract) + :Pros + :*Free + :Cons + :*Windows only + :*Text output only + :​*Documentation in Italian + + * BrainWare + :Brainware is a product like ReadSoft, geared toward the bigger picture of automating data management. ​ OCR is only a small piece of what they do.