##### Differences

This shows you the differences between two versions of the page.

 mind:word2vec [2016/09/25 16:27]bayb2 mind:word2vec [2016/09/25 16:49]bayb2 Both sides previous revision Previous revision 2016/09/25 16:49 bayb2 2016/09/25 16:44 bayb2 2016/09/25 16:42 bayb2 2016/09/25 16:39 bayb2 [Examples] 2016/09/25 16:39 bayb2 2016/09/25 16:34 bayb2 2016/09/25 16:34 bayb2 [Intro] 2016/09/25 16:33 bayb2 [Intro] 2016/09/25 16:32 bayb2 2016/09/25 16:28 bayb2 [Continuous bag of words model] 2016/09/25 16:27 bayb2 2016/09/25 16:25 bayb2 [Examples] 2016/09/25 16:24 bayb2 2016/09/25 16:23 bayb2 created Next revision Previous revision 2016/09/25 16:49 bayb2 2016/09/25 16:44 bayb2 2016/09/25 16:42 bayb2 2016/09/25 16:39 bayb2 [Examples] 2016/09/25 16:39 bayb2 2016/09/25 16:34 bayb2 2016/09/25 16:34 bayb2 [Intro] 2016/09/25 16:33 bayb2 [Intro] 2016/09/25 16:32 bayb2 2016/09/25 16:28 bayb2 [Continuous bag of words model] 2016/09/25 16:27 bayb2 2016/09/25 16:25 bayb2 [Examples] 2016/09/25 16:24 bayb2 2016/09/25 16:23 bayb2 created Line 1: Line 1: =Word2Vec= =Word2Vec= - ==Notation== + ==Intro== - ===Algebraic notation=== + “Just as Van Gogh’s painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.” --DL4J - ​knee ​- leg = elbow - arm + Word2Vec can guess a word’s association with other words, or cluster documents and define them by topic. It makes qualities into quantities, and similar things and ideas are shown to be “close” in its 500-dimension vectorspace. - ===English logic=== + Word2Vec is not classified as "deep learning"​ because it is only a 2-layer neural net. - knee is to leg as elbow is to arm + Input -> text corpus + Output -> set of vectors, or neural word embeddings - ===Logical analogy notation=== - knee : leg :: elbow : arm + ===Examples=== + Rome - Italy = Beijing - China, so Rome - Italy + China = Beijing + king : queen :: man : woman + house : roof :: castle : [dome, bell_tower, spire, crenellations,​ turrets] - Word2Vec is a 2-layer neural net + ​China ​: Taiwan :: Russia : [Ukraine, Moscow, Moldova, Armenia] - Input -> text corpus + - Output ->​ set of vectors (neural word embeddings) + - Can guess a word’s association with other words, or cluster documents and define them by topic. + - More research: Cosine similarity, dot product equation + - ==Models== + ==Notation== - ===Continuous bag of words model=== + ===Algebraic notation=== - Using context to predict a target word. Faster. + - ===Skip-gram model=== + - Using a word to predict a target context. Produces more accurate results on large datasets. + - “Just as Van Gogh’s painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.” + ​knee ​- leg = elbow - arm - Each word is a point in a 500-dimensional vectorspace. + ===English logic=== - Qualities become quantities, and similar things and ideas are shown to be “close”. + - More than three layers in a neural network (including input and output) qualifies ​as “deep” learning. Deep means more than one hidden layer. + knee is to leg as elbow is to arm - ==Implementation== + ===Logical analogy notation=== - Word2Vec can be implemented in DL4J, TensorFlow + + knee : leg :: elbow : arm + ==Models== + ===Continuous bag of words (CBOW) model=== + *Uses a context to predict a target word. Faster. + *Several times faster to train than the skip-gram, slightly better accuracy for frequent words. + ===Skip-gram model=== + *Uses a word to predict a target context. + *Works well with small amount of the training data, represents well even rare words or phrases. + *Produces more accurate results on large datasets. - ==Examples== + ==Implementation== - + Word2Vec can be implemented in DL4J, TensorFlow - Rome - Italy = Beijing - China, so Rome - Italy + China = Beijing + - + - king : queen :: man : woman + - + - house : roof :: castle : [dome, bell_tower, spire, crenellations, turrets] + - China : Taiwan :: Russia : [Ukraine, Moscow, Moldova, Armenia] + ==To research== + *Implementation + *Cosine similarity, dot product equation usage ==Links== ==Links== *http://​deeplearning4j.org/​word2vec *http://​deeplearning4j.org/​word2vec