Goto

Collaborating Authors

 rankmax



2 Projectiononthe(n,k)-simplex Weconsiderthefollowingprojectionproblem: pα(z)=argmin

Neural Information Processing Systems

Usually, this is done by projecting the score vector onto a probability simplex, and such projections are often characterized as Lipschitz continuous approximations of the argmax function, whose Lipschitz constant is controlled by a parameter that is similar to a softmax temperature.


Rankmax: An Adaptive Projection Alternative to the Softmax Function

Neural Information Processing Systems

Several machine learning models involve mapping a score vector to a probability vector. Usually, this is done by projecting the score vector onto a probability simplex, and such projections are often characterized as Lipschitz continuous approximations of the argmax function, whose Lipschitz constant is controlled by a parameter that is similar to a softmax temperature. The aforementioned parameter has been observed to affect the quality of these models and is typically either treated as a constant or decayed over time. In this work, we propose a method that adapts this parameter to individual training examples. The resulting method exhibits desirable properties, such as sparsity of its support and numerically efficient implementation, and we find that it significantly outperforms competing non-adaptive projection methods. In our analysis, we also derive the general solution of (Bregman) projections onto the (n, k)-simplex, a result which may be of independent interest.



Rankmax: An Adaptive Projection Alternative to the Softmax Function

Neural Information Processing Systems

Many machine learning models involve mapping a score vector to a probability vector. Usually, this is done by projecting the score vector onto a probability simplex, and such projections are often characterized as Lipschitz continuous approximations of the argmax function, whose Lipschitz constant is controlled by a parameter that is similar to a softmax temperature.



Review for NeurIPS paper: Rankmax: An Adaptive Projection Alternative to the Softmax Function

Neural Information Processing Systems

Strengths: * The paper is concerned with the derivation of k-argmax function's continuous approximation as a generic projection of a score vector onto the (n, 1)-simplex or the (n, k)-simplex (for predicting top-k relevant labels) based on a strongly convex function g . The first interesting contribution shows how to obtain such approximation and derives the general solution of this problem provided some properties of g (it is separable, 1-strongly convex). Relevant g s are quadratic function, negative entropy. Specifically, Euclidean projection with adapted Lipschitz constant \alpha of the projection to the training instance is devised as the Rankmax operator. The key element is that \alpha can be computed such that the sample's labels occurs in the top-k.


Review for NeurIPS paper: Rankmax: An Adaptive Projection Alternative to the Softmax Function

Neural Information Processing Systems

Three knowledgeable referees support acceptance for the contributions. One reviewer (R1) was slightly on the reject side but I discounted that review because of low confidence. However, please consider revising your paper to include suggested references, as also promised in the rebuttal, and if possible also extend your empirical evaluation.


Rankmax: An Adaptive Projection Alternative to the Softmax Function

Neural Information Processing Systems

Several machine learning models involve mapping a score vector to a probability vector. Usually, this is done by projecting the score vector onto a probability simplex, and such projections are often characterized as Lipschitz continuous approximations of the argmax function, whose Lipschitz constant is controlled by a parameter that is similar to a softmax temperature. The aforementioned parameter has been observed to affect the quality of these models and is typically either treated as a constant or decayed over time. In this work, we propose a method that adapts this parameter to individual training examples. The resulting method exhibits desirable properties, such as sparsity of its support and numerically efficient implementation, and we find that it significantly outperforms competing non-adaptive projection methods.