Category: implementation

Word2vec Implementation with Keras 2.0

There are many tutorials for implementing word2vec in Keras such as:

https://adventuresinmachinelearning.com/word2vec-keras-tutorial/

http://www.claudiobellei.com/2018/01/07/backprop-word2vec-python/

https://towardsdatascience.com/understanding-feature-engineering-part-4-deep-learning-methods-for-text-data-96c44370bbfa

https://zhuanlan.zhihu.com/p/42651829

Many of the tutorials either do not implement negative sampling or if they do, it is only for a version of Keras below 2.0. The problem with these tutorials is that the Merge layer used in them is deprecated in newer versions of Keras. For example, https://zhuanlan.zhihu.com/p/42651829 uses the following code for model definition, where Merge is imported from keras.layers.

from keras.layers import Merge
from keras.layers.core import Dense, Reshape
from keras.layers.embeddings import Embedding
from keras.models import Sequential

# build skip-gram architecture
word_model = Sequential()
word_model.add(Embedding(vocab_size, embed_size,
                         embeddings_initializer="glorot_uniform",
                         input_length=1))
word_model.add(Reshape((embed_size, )))

context_model = Sequential()
context_model.add(Embedding(vocab_size, embed_size,
                  embeddings_initializer="glorot_uniform",
                  input_length=1))
context_model.add(Reshape((embed_size,)))

model = Sequential()
model.add(Merge([word_model, context_model], mode="dot"))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))
model.compile(loss="mean_squared_error", optimizer="rmsprop") 


The following is a visualization of the keras architecture defined in the above code:  I searched through many solutions on blogs and stackoverflow to modify this for the new keras version, however, while I could find some advice, I did not find a newer implementation anywhere. So after figuring out the solution, I decided to post it here for future reference and also for others facing the same problem. The aim of this tutorial is not to explain word2vec or every layer in the keras model since readers can refer to the links posted above, but to provide a solution to the specific problem related to the Keras version. In Keras 2.0, Merge is an abstract class and cannot be imported directly. The following line produces an error: 
model.add(Merge([word_model, context_model], mode="dot")) To fix the problem, instead of using the Merge layer, we directly import dot from keras.layers instead of Merge. However, the input to the dot function should be 
word_model.output and context_model.output since the input to dot needs to be a list of tensors and not layers. The second change that needs to be made is the definition of model. Since we are merging models and not layers, the sequential definition of the model is not enough. Instead, we use the functional API for defining the model. The word_model and context_model are inputs to the Model function, with the dot_product being the output. The following snippet reflects these changes in the model definition.
from keras.layers import dot
from keras.layers.core import Dense, Reshape
from keras.layers.embeddings import Embedding
from keras.models import Sequential

word_model = Sequential() 
word_model.add(Embedding(vocab_size, embed_size, 
               embeddings_initializer="glorot_uniform", 
               input_length=1)) 
word_model.add(Reshape((embed_size, ))) 

context_model = Sequential() 
context_model.add(Embedding(vocab_size, embed_size, 
                  embeddings_initializer="glorot_uniform", 
                  input_length=1)) 
context_model.add(Reshape((embed_size,))) 

dot_product = dot([word_model.output, context_model.output], axes=1,
                  normalize=False) 
dot_product = Dense(1, kernel_initializer="glorot_uniform", 
              activation="sigmoid")(dot_product) 
model = Model(inputs=[word_model.input, context_model.input], 
              outputs=dot_product) 
model.compile(loss="mean_squared_error", optimizer="rmsprop") 


The new model definition is generated with model.summary(: __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to 
==================================================================================================  embedding_1_input (InputLayer) (None, 1) 0 __________________________________________________________________________________________________ embedding_2_input (InputLayer) (None, 1) 0 __________________________________________________________________________________________________ embedding_1 (Embedding) (None, 1, 16) 112 embedding_1_input[0][0] __________________________________________________________________________________________________ embedding_2 (Embedding) (None, 1, 16) 112 embedding_2_input[0][0] __________________________________________________________________________________________________ reshape_1 (Reshape) (None, 16) 0 embedding_1[0][0] __________________________________________________________________________________________________ reshape_2 (Reshape) (None, 16) 0 embedding_2[0][0] __________________________________________________________________________________________________ dot_1 (Dot) (None, 1) 0 reshape_1[0][0] reshape_2[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 1) 2 dot_1[0][0] ================================================================================================== Total params: 226 Trainable params: 226 Non-trainable params: 0 

 __________________________________________________________________________________________________ None The model can be visualized as follows:


Every other step for training the model remains the same. That's it!


References:
https://stackoverflow.com/questions/46397258/how-to-merge-sequential-models-in-keras-2-0
https://zhuanlan.zhihu.com/p/42651829