We know of a few types of word analogies, like "France capital Paris" and "US currency dollar", but has anyone tried to search for all the possible analogies that can be deducted by word2vec? They would have to find modifiers that have multiple matches, like "word1 modifier word2". An algorithm could be to cluster all the difference vectors (word1-word2, for all words) and select words that are close to the centers of dense clusters. Even if we don't find all modifiers, we can infer more by combining with ontologies/word net. If we find all the types of analogy we could make a large test dataset to benchmark how capable are the various word embeddings of representing analogy.
May-21-2016, 16:50:30 GMT