Reviews: One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities

Jan-20-2025, 14:26:03 GMT–Neural Information Processing Systems

In my view, the main reason the proposed lower bound is interesting is that it offers a potential way to speed up training for multi-class models with a very large number of classes. While it is useful to understand other properties of the lower bound, the paper could be improved by emphasizing this primary use case in machine learning. Figure 1c and Figure 3 need a more clear explanation of what is being displayed, and why it is important. In particular, what value is being plotted on the y-axis, and at what setting of the parameters w. Here is how I understand it, for Figure 1c say: Blue Line - value of Eq. (13) at the setting of parameters w that maximize 13 Red Line - value of Eq. (13) at the setting of parameters w that maximize 14 Green Line - value of Eq. (13)? at the setting of parameters w that maximize the Bouchard lower bound (?) Red dashed line - value of Eq. (13)? at parameters w based on the given iterations of training?

one-vs-each approximation, scalable estimation, softmax, (8 more...)

Neural Information Processing Systems

Jan-20-2025, 14:26:03 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.62)