Status of normalization in Language ID as of 17 July 2008:

As far as I know, normalization is still there and working. It's configurable using the rbldr_norm property in .lidconfig files (i.e. cmake knows about it) and thus should be able to be set at the command line using

cmake -DRBLDR_NORM_FORCE=

and then the number indicating the type of normalization you want. (Introduction to Language ID talks about this). I've just never really used it very much, even though I know it improves performance. The regression testing infrastructure also supports tracking what type of normalization was used to generate a result. I actually did a java reimplementation of the normalization code inside of edu.byu.nlp.experimentation.results.AbstractIdentificationResult.normalize, but at the moment it's disabled because I wasn't sure it was doing the right thing. –Josh 18:28, 17 July 2008 (MDT)