Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
mind:word2vec [2016/09/25 16:27]
bayb2
mind:word2vec [2016/09/25 16:49]
bayb2
Line 1: Line 1:
 =Word2Vec= =Word2Vec=
  
-==Notation== +==Intro== 
-===Algebraic notation===+“Just as Van Gogh’s painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.” --DL4J
  
- ​knee ​leg = elbow - arm+Word2Vec can guess a word’s association with other words, or cluster documents and define them by topic. It makes qualities into quantities, and similar things and ideas are shown to be “close” in its 500-dimension vectorspace.
  
-===English logic===+Word2Vec is not classified as "deep learning"​ because it is only a 2-layer neural net.
  
- knee is to leg as elbow is to arm+Input -> text corpus 
 +Output -> set of vectors, or neural word embeddings
  
-===Logical analogy notation=== 
  
- knee : leg :: elbow : arm+===Examples===
  
 + Rome - Italy = Beijing - China, so Rome - Italy + China = Beijing
  
 + king : queen :: man : woman
  
 + house : roof :: castle : [dome, bell_tower, spire, crenellations,​ turrets]
  
-Word2Vec is a 2-layer neural net + ​China ​Taiwan :: Russia : [Ukraine, Moscow, MoldovaArmenia]
-Input -> text corpus +
-Output ->​ set of vectors (neural word embeddings) +
-Can guess a word’s association with other words, or cluster documents and define them by topic. +
-More researchCosine similaritydot product equation+
  
  
-==Models== +==Notation== 
-===Continuous bag of words model===  +===Algebraic notation===
-Using context to predict a target word. Faster. +
-===Skip-gram model===  +
-Using a word to predict a target context. Produces more accurate results on large datasets.+
  
-“Just as Van Gogh’s painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.”+ ​knee ​leg = elbow arm
  
-Each word is a point in a 500-dimensional vectorspace. +===English logic===
-Qualities become quantities, and similar things and ideas are shown to be “close”.+
  
-More than three layers in a neural network (including input and output) qualifies ​as “deep” learning. Deep means more than one hidden layer.+ knee is to leg as elbow is to arm
  
-==Implementation== +===Logical analogy notation===
-Word2Vec can be implemented in DL4J, TensorFlow+
  
 + knee : leg :: elbow : arm
  
 +==Models==
 +===Continuous bag of words (CBOW) model===
 +*Uses a context to predict a target word. Faster.
 +*Several times faster to train than the skip-gram, slightly better accuracy for frequent words.
  
 +===Skip-gram model===
 +*Uses a word to predict a target context.
 +*Works well with small amount of the training data, represents well even rare words or phrases.
 +*Produces more accurate results on large datasets.
  
-==Examples== +==Implementation== 
- +Word2Vec can be implemented in DL4JTensorFlow
-Rome - Italy = Beijing - China, so Rome - Italy + China = Beijing +
- +
-king : queen :: man : woman +
- +
-house : roof :: castle : [dome, bell_tower, spire, crenellationsturrets]+
  
-China : Taiwan :: Russia : [UkraineMoscow, Moldova, Armenia]+==To research== 
 +*Implementation 
 +*Cosine similaritydot product equation usage
  
 ==Links== ==Links==
 *http://​deeplearning4j.org/​word2vec *http://​deeplearning4j.org/​word2vec
  
mind/word2vec.txt · Last modified: 2016/09/25 16:49 by bayb2
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0