Sentence Similarity in Python using Doc2Vec – Kanoki

#artificialintelligence 

Numeric representation of Text documents is challenging task in machine learning and there are different ways there to create the numerical features for texts such as vector representation using Bag of Words, Tf-IDF etc.I am not going in detail what are the advantages of one over the other or which is the best one to use in which case. There are lot of good reads available to explain this. It's a Model to create the word embeddings, where it takes input as a large corpus of text and produces a vector space typically of several hundred dimesions. The underlying assumption of Word2Vec is that two words sharing similar contexts also share a similar meaning and consequently a similar vector representation from the model. For instance: "Bank", "money" and "accounts" are often used in similar situations, with similar surrounding words like "dollar", "loan" or "credit", and according to Word2Vec they will therefore share a similar vector representation.