AITopics | sampled softmax

Sampled Softmax with Random Fourier Features

Neural Information Processing SystemsDec-26-2025, 02:10:43 GMT

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution. We show that RF-softmax leads to low bias in estimation in terms of both the full softmax distribution and the full softmax gradient.

random fourier feature, sampled softmax, softmax distribution, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sampled Softmax with Random Fourier Features

Neural Information Processing SystemsMay-27-2025, 17:04:10 GMT

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution.

random fourier feature, sampled softmax, softmax distribution, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Reviews: Sampled Softmax with Random Fourier Features

Neural Information Processing SystemsJan-27-2025, 17:13:54 GMT

As a result, I will retain my scores and recommend this paper for acceptance. I kindly ask the authors to incorporate all the promised changes to the camera ready version. In such problems, it becomes expensive to evaluate the log-partition function for each instance from training sample. The main idea is to approximate the log-partition function by sampling a small number of scores corresponding to negative labels (different from the label assigned to a training sample). The model is given in Eq. (1), where the score for the i-th class is given by the inner product between a representation of an instance h and a parameter vector c_i representing the class.

exponential family model, log-partition function, negative label, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

Reviews: Sampled Softmax with Random Fourier Features

Neural Information Processing SystemsJan-27-2025, 17:13:43 GMT

The reviewers all considered the author feedback and discussed the paper thoroughly. Most concerns could thereby be clarified but not all. Overall, I consider the contribution sufficiently interesting and valuable to justify accepting the paper as a poster.

random fourier feature, sampled softmax

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

Sampled Softmax with Random Fourier Features

Neural Information Processing SystemsOct-11-2024, 03:23:14 GMT

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution.

random fourier feature, sampled softmax, softmax distribution, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

Yang, Xiaodong, Chen, Huiyuan, Yan, Yuchen, Tang, Yuxin, Zhao, Yuying, Xu, Eric, Cai, Yiwei, Tong, Hanghang

arXiv.org Artificial IntelligenceJun-23-2024

The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address this issue, the recently proposed Sampled Softmax Cross-Entropy (SSM) compares one positive sample with multiple negative samples, leading to better performance. Our comprehensive experiments confirm that recommender systems consistently benefit from multiple negative samples during training. Furthermore, we introduce a \underline{Sim}plified Sampled Softmax \underline{C}ross-\underline{E}ntropy Loss (SimCE), which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.

negative sample, proceedings, simce, (16 more...)

arXiv.org Artificial Intelligence

2406.1617

Country:

North America > United States > Illinois > Champaign County > Champaign (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sampled Softmax with Random Fourier Features

Rawat, Ankit Singh, Chen, Jiecao, Yu, Felix Xinnan X., Suresh, Ananda Theertha, Kumar, Sanjiv

Neural Information Processing SystemsMar-19-2020, 02:17:30 GMT

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution.

random fourier feature, sampled softmax, softmax distribution, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback