This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
nlp-private:library-ocr-tasks [2015/04/23 13:36] ryancha created |
nlp-private:library-ocr-tasks [2015/04/23 13:37] ryancha |
||
---|---|---|---|
Line 7: | Line 7: | ||
* Use more OCR engines in the current system | * Use more OCR engines in the current system | ||
** [http://www.irislink.com/c2-1584-189/Readiris-12---OCR-Software-------Convert-your-Paper-Documents-into-Editable-Text-.aspx ReadIRIS] | ** [http://www.irislink.com/c2-1584-189/Readiris-12---OCR-Software-------Convert-your-Paper-Documents-into-Editable-Text-.aspx ReadIRIS] | ||
- | ** Adobe Acrobat OCR -- Assigned to [[User:Cr24|Chris Rotz]] | + | ** Adobe Acrobat OCR -- Assigned to [[Cr24|Chris Rotz]] |
** [http://www.primerecognition.com/ Prime OCR] | ** [http://www.primerecognition.com/ Prime OCR] | ||
* OCR confusion matrix to adjust the costs on mismatches. The hope is that there will be fewer paths through the network which may allow us to do more complex documents, and explore the network more quickly. | * OCR confusion matrix to adjust the costs on mismatches. The hope is that there will be fewer paths through the network which may allow us to do more complex documents, and explore the network more quickly. | ||
Line 16: | Line 16: | ||
* Use a language model to select between multiple accepted words. Requires augmenting the lattice as described above. | * Use a language model to select between multiple accepted words. Requires augmenting the lattice as described above. | ||
** Need a mid-20th century news corpus for training. | ** Need a mid-20th century news corpus for training. | ||
- | * [[Sclite Viewer]]: take an Sclite file and view the contents in a way that shows each "sausage". -- Assigned to [[User:Cr24|Chris Rotz]] | + | * [[Sclite Viewer]]: take an Sclite file and view the contents in a way that shows each "sausage". -- Assigned to [[Cr24|Chris Rotz]] |
* [[Aligned Backpointer Viewer]]: take the aligned backpointer output of DocumentLattice and view the contents in a way that shows the optimal alignment, the "sausages" and a count of the optimal paths for each sausage. | * [[Aligned Backpointer Viewer]]: take the aligned backpointer output of DocumentLattice and view the contents in a way that shows the optimal alignment, the "sausages" and a count of the optimal paths for each sausage. | ||
Line 60: | Line 60: | ||
** Complete runs on both DS2 and Marylou5 | ** Complete runs on both DS2 and Marylou5 | ||
- | ==Tasks for [[User:Cr24|Chris Rotz]]== | + | ==Tasks for [[Cr24|Chris Rotz]]== |
* Come up to speed | * Come up to speed | ||
* Run Eisenhower Communiques through Adobe OCR | * Run Eisenhower Communiques through Adobe OCR |