There are many tutorials for implementing word2vec in Keras such as:
https://adventuresinmachinelearning.com/word2vec-keras-tutorial/
http://www.claudiobellei.com/2018/01/07/backprop-word2vec-python/
https://zhuanlan.zhihu.com/p/42651829
Many of the tutorials either do not implement negative sampling or if they do, it is only for a version of Keras below 2.0. The problem with these tutorials is that the Merge layer used in them is deprecated in newer versions of Keras. For example, https://zhuanlan.zhihu.com/p/42651829 uses the following code for model definition, where Merge is imported from keras.layers.
from keras.layers import Merge from keras.layers.core import Dense, Reshape from keras.layers.embeddings import Embedding from keras.models import Sequential # build skip-gram architecture word_model = Sequential() word_model.add(Embedding(vocab_size, embed_size, embeddings_initializer="glorot_uniform", input_length=1)) word_model.add(Reshape((embed_size, ))) context_model = Sequential() context_model.add(Embedding(vocab_size, embed_size, embeddings_initializer="glorot_uniform", input_length=1)) context_model.add(Reshape((embed_size,))) model = Sequential() model.add(Merge([word_model, context_model], mode="dot")) model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid")) model.compile(loss="mean_squared_error", optimizer="rmsprop")
The following is a visualization of the keras architecture defined in the above code: I searched through many solutions on blogs and stackoverflow to modify this for the new keras version, however, while I could find some advice, I did not find a newer implementation anywhere. So after figuring out the solution, I decided to post it here for future reference and also for others facing the same problem. The aim of this tutorial is not to explain word2vec or every layer in the keras model since readers can refer to the links posted above, but to provide a solution to the specific problem related to the Keras version. In Keras 2.0, Merge is an abstract class and cannot be imported directly. The following line produces an error:model.add(Merge([word_model, context_model], mode="dot"))
To fix the problem, instead of using the Merge layer, we directly import dot from keras.layers instead of Merge. However, the input to the dot function should beword_model.output
andcontext_model.output
since the input to dot needs to be a list of tensors and not layers. The second change that needs to be made is the definition of model. Since we are merging models and not layers, the sequential definition of the model is not enough. Instead, we use the functional API for defining the model. Theword_model
andcontext_model
are inputs to the Model function, with thedot_product
being the output. The following snippet reflects these changes in the model definition.
from keras.layers import dot from keras.layers.core import Dense, Reshape from keras.layers.embeddings import Embedding from keras.models import Sequential word_model = Sequential() word_model.add(Embedding(vocab_size, embed_size, embeddings_initializer="glorot_uniform", input_length=1)) word_model.add(Reshape((embed_size, ))) context_model = Sequential() context_model.add(Embedding(vocab_size, embed_size, embeddings_initializer="glorot_uniform", input_length=1)) context_model.add(Reshape((embed_size,))) dot_product = dot([word_model.output, context_model.output], axes=1, normalize=False) dot_product = Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid")(dot_product) model = Model(inputs=[word_model.input, context_model.input], outputs=dot_product) model.compile(loss="mean_squared_error", optimizer="rmsprop")
The new model definition is generated withmodel.summary(: __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== embedding_1_input (InputLayer) (None, 1) 0 __________________________________________________________________________________________________ embedding_2_input (InputLayer) (None, 1) 0 __________________________________________________________________________________________________ embedding_1 (Embedding) (None, 1, 16) 112 embedding_1_input[0][0] __________________________________________________________________________________________________ embedding_2 (Embedding) (None, 1, 16) 112 embedding_2_input[0][0] __________________________________________________________________________________________________ reshape_1 (Reshape) (None, 16) 0 embedding_1[0][0] __________________________________________________________________________________________________ reshape_2 (Reshape) (None, 16) 0 embedding_2[0][0] __________________________________________________________________________________________________ dot_1 (Dot) (None, 1) 0 reshape_1[0][0] reshape_2[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 1) 2 dot_1[0][0] ================================================================================================== Total params: 226 Trainable params: 226 Non-trainable params: 0
__________________________________________________________________________________________________ None The model can be visualized as follows:
Every other step for training the model remains the same. That's it! References: https://stackoverflow.com/questions/46397258/how-to-merge-sequential-models-in-keras-2-0 https://zhuanlan.zhihu.com/p/42651829