Sentiment Analysis of Movie Reviews (3): doc2vec
This is the last – for now – installment of my mini-series on sentiment analysis of the Stanford collection of IMDB reviews (originally published on recurrentnull.wordpress.com). So far, we've had a look at classical bag-of-words models and word vectors (word2vec). We saw that from the classifiers used, logistic regression performed best, be it in combination with bag-of-words or word2vec. We also saw that while the word2vec model did in fact model semantic dimensions, it was less successful for classification than bag-of-words, and we explained that by the averaging of word vectors we had to perform to obtain input features on review (not word) level. So the question now is: How would distributed representations perform if we did not have to throw away information by averaging word vectors?
Nov-2-2016, 04:55:03 GMT