skip-thought model
Deep Learning Reading Group: Skip-Thought Vectors - ThetaZero
Continuing the tour of older papers that started with our ResNet blog post, we now take on Skip-Thought Vectors by Kiros et al. Their goal was to come up with a useful embedding for sentences that was not tuned for a single task and did not require labeled data to train. They took inspiration from Word2Vec skip-gram (you can find my explanation of that algorithm here) and attempt to extend it to sentences. Skip-thought vectors are created using an encoder-decoder model. The encoder takes in the training sentence and outputs a vector.
tensorflow/models
The Skip-Thoughts model is a sentence encoder. It learns to encode input sentences into a fixed-dimensional vector representation that is useful for many tasks, for example to detect paraphrases or to classify whether a product review is positive or negative. See the Skip-Thought Vectors paper for details of the model architecture and more example applications. A trained Skip-Thoughts model will encode similar sentences nearby each other in the embedding vector space. The following examples show the nearest neighbor by cosine similarity of some sentences from the movie review dataset.
Lab41 Reading Group: Skip-Thought Vectors
Their model requires groups of sentences in order to train, and so trained on the BookCorpus Dataset. The dataset consists of novels by unpublished authors and is (unsurprisingly) dominated by romance and fantasy novels. This "bias" in the dataset will become apparent later when discussing some of the sentences used to test the skip-thought model; some of the retrieved sentences are quite exciting! Building a model that accounts for the meaning of an entire sentence is tough because language is remarkably flexible. Changing a single word can either completely change the meaning of a sentence or leave it unaltered.