This shows you the differences between two versions of the page.

Link to this comparison view

nlp:synthetic-ocr-data [2015/04/21 16:34] (current)
ryancha created
Line 1: Line 1:
 +Thank you for your interest in our synthetic OCR datasets.
 +We are currently working on publishing a set of synthetic OCR datasets based on the common text analytics datasets 20 Newsgroups Reuters 21578, and the Enron e-mail corpus.
 +The finishing touches are being completed, and we are planning on working with the LDC to publish it through them.
 +If you would like to receive updates regarding the progress of the dataset, or would like to ask questions about it you may subscribe to our [http://​groups.google.com/​group/​byu_synthetic_ocr_data Google Group],
 +which we anticipate be very low volume (< 1 message/​month).
 +You may obtain a copy of the code used to generate the dataset by cloning our mercurial repository:
 +    ​
 +    git clone git://​nlp.cs.byu.edu/​generate_artificial_ocr_data.git
nlp/synthetic-ocr-data.txt ยท Last modified: 2015/04/21 16:34 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0