Goto

Collaborating Authors

 surrogate gap



Exploiting the Surrogate Gap in Online Multiclass Classification

Neural Information Processing Systems

In the full information setting we provide expected mistake bounds for \textproc{Gaptron} with respect to the logistic loss, hinge loss, and the smooth hinge loss with $O(K)$ regret, where the expectation is with respect to the learner's randomness and $K$ is the number of classes. In the bandit classification setting we show that \textproc{Gaptron} is the first linear time algorithm with $O(K\sqrt{T})$ expected regret. Additionally, the expected mistake bound of \textproc{Gaptron} does not depend on the dimension of the feature vector, contrary to previous algorithms with $O(K\sqrt{T})$ regret in the bandit classification setting. We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses rather than exploiting properties such as exp-concavity or mixability, which are traditionally used to prove logarithmic or constant regret bounds.


Non-Stationary Online Structured Prediction with Surrogate Losses

Sakaue, Shinsaku, Bao, Han, Cao, Yuzhou

arXiv.org Artificial Intelligence

Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.




Review for NeurIPS paper: Exploiting the Surrogate Gap in Online Multiclass Classification

Neural Information Processing Systems

Summary and Contributions: This paper considers online multiclass classification, in both the full-information and bandit settings, where the objective is to minimize regret with respect to the hinge loss (or logistic loss or smooth hinge loss) of the best linear predictor in hindsight. The loss of the online algorithm is 0/1-loss, so "bandit feedback" (only learning the loss of the action chosen) corresponds to only learning if the prediction made was correct or not. The paper presents a new algorithm that has a particularly good combination of running time and regret bound in the bandit setting. One thing I was looking for but couldn't find (maybe I missed it) is a discussion of what makes multiclass special. The gap between loss functions, e.g., as given in Figure 1, holds for binary too.


Review for NeurIPS paper: Exploiting the Surrogate Gap in Online Multiclass Classification

Neural Information Processing Systems

The paper presents a new algorithm for a problem that is relevant for the conference. The paper is unclear in some places, however. Therefore, we strongly urge you to work to make the presentation simpler (e.g., is it possible to simplify the notation in some way to make the paper more "broadly accessible" as suggested by reviewer #2).


Meta Curvature-Aware Minimization for Domain Generalization

Chen, Ziyang, Ye, Yiwen, Tang, Feilong, Pan, Yongsheng, Xia, Yong

arXiv.org Artificial Intelligence

Domain generalization (DG) aims to enhance the ability of models trained on source domains to generalize effectively to unseen domains. Recently, Sharpness-Aware Minimization (SAM) has shown promise in this area by reducing the sharpness of the loss landscape to obtain more generalized models. However, SAM and its variants sometimes fail to guide the model toward a flat minimum, and their training processes exhibit limitations, hindering further improvements in model generalization. In this paper, we first propose an improved model training process aimed at encouraging the model to converge to a flat minima. To achieve this, we design a curvature metric that has a minimal effect when the model is far from convergence but becomes increasingly influential in indicating the curvature of the minima as the model approaches a local minimum. Then we derive a novel algorithm from this metric, called Meta Curvature-Aware Minimization (MeCAM), to minimize the curvature around the local minima. Specifically, the optimization objective of MeCAM simultaneously minimizes the regular training loss, the surrogate gap of SAM, and the surrogate gap of meta-learning. We provide theoretical analysis on MeCAM's generalization error and convergence rate, and demonstrate its superiority over existing DG methods through extensive experiments on five benchmark DG datasets, including PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet. Code will be available on GitHub.


Exploiting the Surrogate Gap in Online Multiclass Classification

Neural Information Processing Systems

In the full information setting we provide expected mistake bounds for \textproc{Gaptron} with respect to the logistic loss, hinge loss, and the smooth hinge loss with O(K) regret, where the expectation is with respect to the learner's randomness and K is the number of classes. In the bandit classification setting we show that \textproc{Gaptron} is the first linear time algorithm with O(K\sqrt{T}) expected regret. Additionally, the expected mistake bound of \textproc{Gaptron} does not depend on the dimension of the feature vector, contrary to previous algorithms with O(K\sqrt{T}) regret in the bandit classification setting. We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses rather than exploiting properties such as exp-concavity or mixability, which are traditionally used to prove logarithmic or constant regret bounds.


Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

Sakaue, Shinsaku, Bao, Han, Tsuchiya, Taira, Oki, Taihei

arXiv.org Artificial Intelligence

This paper studies online structured prediction with full-information feedback. For online multiclass classification, van der Hoeven (2020) has obtained surrogate regret bounds independent of the time horizon, or \emph{finite}, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses including the logistic loss for multiclass classification, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(B^2)$, where $B$ is the $\ell_2$-diameter of the domain. This bound is tight up to logarithmic factors and improves the previous bound of $O(dB^2)$ due to van der Hoeven (2020) by a factor of $d$, the number of classes.