Goto

Collaborating Authors

 dimensionality


On the Dimensionality of Word Embedding

Neural Information Processing Systems

In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding.



a878dbebc902328b41dbf02aa87abb58-AuthorFeedback.pdf

Neural Information Processing Systems

Other baselines (LTS/Void/GES) were faster(<1h), b/c the toy problems were not25 simulatortimedominated. The extra assumptions come from defining a40 model class (the gen model) that restricts the set of potential solutions and allows interpolation. The gen model is41 building acontinuous approximation ofthesimulator distribution from thesamples and thus implicitly regularizing42 the solution, which is not done with numerical methods. So the gen model has more information in terms of the43 implicit prior defined by choice of model and optimization scheme.