Large embedding layers are a performance problem for fitting models: even though the gradients are sparse (only a handful of user and item vectors need parameter updates in every minibatch), PyTorch updates the entire embedding layer at every backward pass. Computation time is then wasted on applying zero gradient steps to whole embedding matrix. To alleviate this problem, we can use a smaller underlying embedding layer, and probabilistically hash users and items into that smaller space. With good hash functions, collisions should be rare, and we should observe fitting speedups without a decrease in accuracy. The implementation in Spotlight follows the RecSys 2017 paper "Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks.".
Sep-9-2017, 03:00:55 GMT