Reviews: Sampled Softmax with Random Fourier Features

Neural Information Processing Systems 

As a result, I will retain my scores and recommend this paper for acceptance. I kindly ask the authors to incorporate all the promised changes to the camera ready version. In such problems, it becomes expensive to evaluate the log-partition function for each instance from training sample. The main idea is to approximate the log-partition function by sampling a small number of scores corresponding to negative labels (different from the label assigned to a training sample). The model is given in Eq. (1), where the score for the i-th class is given by the inner product between a representation of an instance h and a parameter vector c_i representing the class.