Items from Qualifying Paper

  • Initialize Gibbs with noisy marginal technique
  • Tom Griffiths: measure convergence using Inter-chain metrics (that are immune to label switching)
  • Evolution (or lack thereof) over time between samples within the same chain
    • Auto-correlation
      • likelihood within chains
      • favorite clustering metrics (e.g., ARI, unnormalized K-L divergence of each cell in diagonal)
    • Plot these over time (like the divergence movie in a single graph)
  • Question about negative correlation between MAP sample and metrics on “comb”
  • EM as refinement of Gibbs to climb to mode of local (?) maxima
  • Gelman: converged when inter-chain variance is same as intra-chain variance
    • on likelihood
    • on metrics
  • Another chain summary idea from Kevin Seppi: most frequent label occurring in last 100 samples

Near Term

  • Get comparable likelihood measures for Gibbs and EM
  • Implement and run Variational EM
  • Do a Tech Report with the full derivation of the collapsed sampler
  • Start EM with Gibbs
  • Do Tech-Paper with a full derivation of the collapsed sampler
  • Develop the “comb” idea
  • Implement versions of the partition comparison metrics that can be run on samples (inter-chain and cross-chain)
  • Look at the mean entropy metric. Can this be adapted for
  • Experiment with feature selectors/dimensionality reducers
  • Split out held-out dataset to compute held-out likelihood on Enron
  • Auto stop-word detection / feature selection
  • Complete bibliography of clustering techniques in prep

Longer term:

  • Reproduce a result from one of the papers (LDA)
  • Identify something in the model that can be improved
  • Implement differences and write a paper

CS 601R

  • Fix held-out set handling for CS 601R


  • 9/15: Present hierachical bisecting k-means clustering algorithm at NLP Lab Meeting
  • 9/25: Finish LogCounter (or set it aside for near-term experiments)
  • 9/21: Label with name for every PC
  • 9/25: Get a copy of Hal's evalutation script
  • 9/30: Figure out the profiling situation - JProfiler
  • 10/2: Send me your 598R PowerPoint
  • 10/6: Subscribe to topic-models at Princeton s://lists.cs.princeton.edu/mailman/listinfo/topic-models
  • 10/6: Factor clustering away from classification
  • 10/6: Hoist computation out of “foreach document” loop and getProbabilities() for anything not specific to the current document. e.g., logDistOfC
  • 10/13: Mine Hal's script for good metrics, etc.
  • 10/11: Fix Adjusted Rand index calculation
  • 11/1: Prepare 10-15 minute presentation detailing your current activities for a CS + Machine Learning audience at the UofU for 11/2.
  • 11/3 : Implement P, R, and F_1 as metrics.
  • 11/10 : Implement Variation of information metric


LDAP: couldn't connect to LDAP server
nlp-private/dan.txt · Last modified: 2015/04/23 13:32 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0