Word Embeddings in High-Level
The most common representation of words in NLP tasks is the One Hot Encoding. Below we can see an example of One Hot Encoding for the words "Cat" and "Dog". As we can see, these two vectors are independent since their inner product is 0, and their Euclidean distance is the square root of 2. Notice that this applies to every pair in the vocabulary, meaning that every pair of words are independent, and their distance is the square root of 2. Notice that this applies to every pair in the vocabulary, meaning that every pair of words are independent, and their distance is \(\sqrt(2)\). For example, the words below are considered independent, and the distance -- similarity between any pair of words is the same. This is an issue for NLP tasks since we want to be able to capture the relation between words.
Nov-30-2020, 21:19:25 GMT
- Industry:
- Technology: