Shalev-shwartz, Shai
Mind the Duality Gap: Logarithmic regret algorithms for online optimization
Shalev-shwartz, Shai, Kakade, Sham M.
We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms. Our framework yields the tightest known logarithmic regret bounds for Follow-The-Leader and for the gradient descent algorithm proposed in HazanKaKaAg06. We then show that one can interpolate between these two extreme cases. In particular, we derive a new algorithm that shares the computational simplicity of gradient descent but achieves lower regret in many practical situations. Finally, we further extend our framework for generalized strongly convex functions.
ShareBoost: Efficient multiclass learning with feature sharing
Shalev-shwartz, Shai, Wexler, Yonatan, Shashua, Amnon
Multiclass prediction is the problem of classifying an object into a relevant target class. We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sub-linearly with the number of possible classes. This implies that features should be shared by several classes. We describe and analyze the ShareBoost algorithm for learning a multiclass predictor that uses few shared features. We prove that ShareBoost efficiently finds a predictor that uses few shared features (if such a predictor exists) and that it has a small generalization error. We also describe how to use ShareBoost for learning a non-linear predictor that has a fast evaluation time. In a series of experiments with natural data sets we demonstrate the benefits of ShareBoost and evaluate its success relatively to other state-of-the-art approaches.
Mind the Duality Gap: Logarithmic regret algorithms for online optimization
Shalev-shwartz, Shai, Kakade, Sham M.
We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms. Our framework yields the tightest known logarithmic regret bounds for Follow-The-Leader and for the gradient descent algorithm proposed in HazanKaKaAg06. We then show that one can interpolate between these two extreme cases. In particular, we derive a new algorithm that shares the computational simplicity of gradient descent but achieves lower regret in many practical situations. Finally, we further extend our framework for generalized strongly convex functions.
Fast Rates for Regularized Objectives
Sridharan, Karthik, Shalev-shwartz, Shai, Srebro, Nathan
We study convergence properties of empirical minimization of a stochastic strongly convex objective, where the stochastic component is linear. We show that the value attained by the empirical minimizer converges to the optimal value with rate 1/n. The result applies, in particular, to the SVM objective. Thus, we obtain a rate of 1/n on the convergence of the SVM objective (with fixed regularization parameter)to its infinite data limit. We demonstrate how this is essential for obtaining certain type of oracle inequalities for SVMs.
Convex Repeated Games and Fenchel Duality
Shalev-shwartz, Shai, Singer, Yoram
We describe an algorithmic framework for an abstract game which we term a convex repeated game. We show that various online learning and boosting algorithms can be all derived as special cases of our algorithmic framework. This unified view explains the properties of existing algorithms and also enables us to derive several new interesting algorithms. Our algorithmic framework stems from a connection that we build between the notions of regret in game theory and weak duality in convex optimization.
Convex Repeated Games and Fenchel Duality
Shalev-shwartz, Shai, Singer, Yoram
We describe an algorithmic framework for an abstract game which we term a convex repeatedgame. We show that various online learning and boosting algorithms can be all derived as special cases of our algorithmic framework. This unified view explains the properties of existing algorithms and also enables us to derive several new interesting algorithms. Our algorithmic framework stems from a connection that we build between the notions of regret in game theory and weak duality in convex optimization.
Online Classification for Complex Problems Using Simultaneous Projections
Amit, Yonatan, Shalev-shwartz, Shai, Singer, Yoram
We describe and analyze an algorithmic framework for online classification where each online trial consists of multiple prediction tasks that are tied together. We tackle the problem of updating the online hypothesis by defining a projection problem in which each prediction task corresponds to a single linear constraint. These constraints are tied together through a single slack parameter. We then introduce ageneral method for approximately solving the problem by projecting simultaneously and independently on each constraint which corresponds to a prediction sub-problem,and then averaging the individual solutions. We show that this approach constitutes a feasible, albeit not necessarily optimal, solution for the original projection problem. We derive concrete simultaneous projection schemes and analyze them in the mistake bound model. We demonstrate the power of the proposed algorithm in experiments with online multiclass text categorization. Our experiments indicate that a combination of class-dependent features with the simultaneous projection method outperforms previously studied algorithms.
The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
Dekel, Ofer, Shalev-shwartz, Shai, Singer, Yoram
The Perceptron algorithm, despite its simplicity, often performs well on online classification tasks. The Perceptron becomes especially effective when it is used in conjunction with kernels. However, a common difficulty encounteredwhen implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may grow unboundedly. In this paper we present and analyze the Forgetron algorithmfor kernel-based online learning on a fixed memory budget. To our knowledge, this is the first online learning algorithm which, on one hand, maintains a strict limit on the number of examples it stores while, on the other hand, entertains a relative mistake bound. In addition to the formal results, we also present experiments with real datasets which underscore the merits of our approach.
Online Passive-Aggressive Algorithms
Shalev-shwartz, Shai, Crammer, Koby, Dekel, Ofer, Singer, Yoram
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hinge-loss.