Learning Management
Online Learning in The Manifold of Low-Rank Matrices
Shalit, Uri, Weinshall, Daphna, Chechik, Gal
When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O((n m)k) for a rank-k matrix of dimension m x n, given rank one gradients. We use this algorithm, LORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors.
Online Learning: Random Averages, Combinatorial Parameters, and Learnability
Rakhlin, Alexander, Sridharan, Karthik, Tewari, Ambuj
We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fat-shattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting.
Confusion-Based Online Learning and a Passive-Aggressive Scheme
This paper provides the first ---to the best of our knowledge--- analysis of online learning algorithms for multiclass problems when the {\em confusion} matrix is taken as a performance measure. The work builds upon recent and elegant results on noncommutative concentration inequalities, i.e. concentration inequalities that apply to matrices, and more precisely to matrix martingales. We do establish generalization bounds for online learning algorithm and show how the theoretical study motivate the proposition of a new confusion-friendly learning procedure. This learning algorithm, called \copa (for COnfusion Passive-Aggressive) is a passive-aggressive learning algorithm; it is shown that the update equations for \copa can be computed analytically, thus allowing the user from having to recours to any optimization package to implement it. Papers published at the Neural Information Processing Systems Conference.
Online Learning: Stochastic, Constrained, and Smoothed Adversaries
Rakhlin, Alexander, Sridharan, Karthik, Tewari, Ambuj
Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds.
Efficient Online Learning via Randomized Rounding
Cesa-bianchi, Nicolò, Shamir, Ohad
Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, which combines random playout'' and randomized rounding of loss subgradients. As an application of our approach, we provide the first computationally efficient online algorithm for collaborative filtering with trace-norm constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learning. Papers published at the Neural Information Processing Systems Conference.
(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
Thakurta, Abhradeep Guha, Smith, Adam
We provide a general technique for making online learning algorithms differentially private, in both the full information and bandit settings. Our technique applies to algorithms that aim to minimize a \emph{convex} loss function which is a sum of smaller convex loss terms, one for each data point. We modify the popular \emph{mirror descent} approach, or rather a variant called \emph{follow the approximate leader}. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work.
Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
Yadkori, Yasin Abbasi, Bartlett, Peter L., Kanade, Varun, Seldin, Yevgeny, Szepesvari, Csaba
We study the problem of online learning Markov Decision Processes (MDPs) when both the transition distributions and loss functions are chosen by an adversary. We present an algorithm that, under a mixing assumption, achieves $O(\sqrt{T\log \Pi } \log \Pi)$ regret with respect to a comparison set of policies $\Pi$. The regret is independent of the size of the state and action spaces. When expectations over sample paths can be computed efficiently and the comparison set $\Pi$ has polynomial size, this algorithm is efficient. We also consider the episodic adversarial online shortest path problem.
Parameter-Free Online Learning via Model Selection
Foster, Dylan J., Kale, Satyen, Mohri, Mehryar, Sridharan, Karthik
We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and $\mathbb{R} {d}$ with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model.
Online Learning of Dynamic Parameters in Social Networks
Shahrampour, Shahin, Rakhlin, Sasha, Jadbabaie, Ali
This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state.
Online learning in episodic Markovian decision processes by relative entropy policy search
Zimin, Alexander, Neu, Gergely
We study the problem of online learning in finite episodic Markov decision processes where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space $\A$ and the state space $\X$ has a layered structure with $L$ layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after $T$ episodes is $2\sqrt{L X A T\log( X A/L)}$ in the bandit setting and $2L\sqrt{T\log( X A/L)}$ in the full information setting. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions.