My discussion with Josh today and some further thought have led me to propose the following feature engineering cycle for language id. It seems obvious enough and is probably what was had in mind since it was conceived. I would like to make some additional points that are perhaps new, which I will add at the end of this message. Here is the cycle:

1. Choose a suitable baseline

2. Choose a test, perhaps the baseline with a few extra features

3. Run the comparison

4. Add the results to the “history” plot (graph of baseline vs. further attempts) and tables

5. Compare the “overall” and/or “average” DET Curve, cost, eer, and min cost b/t test and baseline

6. Examine the DET Curve, cost, eer, and min cost in comparison with baseline PER language

7. Examine the cost component matrix (similar to confusion matrix)

8. Choose a language of interest based on 5 and 6 (possibly skipping directly to step 10)

9. Examine the metrics and plots of the language from 7 with all other languages (derived FROM THE N-LANGUAGE DATA)

10. If nothing interesting is found, repeat from step 6 or 7

11. Pick a troublesome/interesting language pair

  • Let the features from 2 be the baseline
  • Run a 1 v 1 of the chosen languages
  • Examine metrics and plots from 1 v 1. NOTE: there’s no real equivalent to step 8 since the main metric and plot is the ONLY language vs. language plot
  • Create/add to the history plot for this language pair (separate from overall history)
  • Repeat until a significant improvement has been achieved or until attempts have been exhausted

12. If features from 11 were unsuccessful, start at step 2 allowing the program to “forget” the bad attempts, if necessary (can still be saved to disk)

13. Add the features from 11 to those defined in 2 and repeat steps 3-9, paying particular interest to the effect of the features on the language pair chosen in 11 (I’ll emphasize that this data is taken from the N_LANGUAGE models/outcomes/etc.)

14. Report the results and any observations (wiki), including information from step 11

15. Start at 2

Now for the important points. The data used in step 9 should from the n-language test NOT any 1 v 1 data. It is entirely possible to build a DET curve filtered to those files where either the truth or hypothesis is language a or b using the n-language data. It is important to use this data (using 1 v. rest models) because in the end, what REALLY matters is the task we care about, and the task we care about involves reporting metrics and curves using the 1 v. rest models. That, then, should be what we use to diagnose problems. In classification, it seems silly to use a confusion matrix for diagnosing problems where each cell comes from a 1 v 1 classifier if in fact you are using a multi-class classifier to do the actual classification. Of course you would use the confusion matrix from the multi-class classifier. That is not to say that the 1 v 1 data is useless, but it certainly shouldn’t be used for diagnosis if it isn’t to be used when computing the metrics. The above procedure shows where 1 v 1 data is useful: AFTER diagnosis, as a (greedy) estimate of the progress for improving a single problem.

To summarize, we should always diagnose a model based on data from that model, and not from different models.

Thus, this argument is without loss of generality. Even if we switch to 1 v 1 models for reporting our metrics, the process of combining them in order to make a hard decision about specific trials means that our “decision model” involves more than just the 1 v 1 maxent (or SVMs). And thus, it is data from this “Decision” model that we should use to diagnose problems. Other data is helpful, but ancillary.

Besides being the correct thing to do, it is advantageous in a practical way: less training (at least in the case where the 1 v. rest are used in our decision model)! Instead of training 1 v rest AND all pairs of 1 v 1, I need only train the pairs of 1 v 1 that seem interesting to me. The payoff of this will be very evident when we are working with speech.

I hope I explained myself well and I hope you’ll agree, but please let me know your thoughts on the matter. Robbie

nlp-private/feature-engineering-cycle.txt · Last modified: 2015/04/23 14:52 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0