 +Thank you for your interest in our synthetic OCR datasets.
 +We are currently working on publishing a set of synthetic OCR datasets based on the common text analytics datasets 20 Newsgroups Reuters 21578, and the Enron e-mail corpus.
 +The finishing touches are being completed, and we are planning on working with the LDC to publish it through them.
 +If you would like to receive updates regarding the progress of the dataset, or would like to ask questions about it you may subscribe to our [http://​groups.google.com/​group/​byu_synthetic_ocr_data Google Group],
 +which we anticipate be very low volume (< 1 message/​month).
 +You may obtain a copy of the code used to generate the dataset by cloning our mercurial repository:
 +    ​
 +    git clone git://​nlp.cs.byu.edu/​generate_artificial_ocr_data.git
