Goto

Collaborating Authors

 Learning Management


Online Learning with Gaussian Payoffs and Side Observations

Neural Information Processing Systems

We consider a sequential learning problem with Gaussian payoffs and side observations: after selecting an action i, the learner receives information about the payoff of every action j in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair (i,j) (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).




Locally-Adaptive Nonparametric Online Learning: Supplementary Material

Neural Information Processing Systems

Next, we consider two algorithms for the problem of prediction with expert advice over trees. The following lemma (whose proof is deferred to Appendix D) formally states this fact. Suppose that Algorithm 5 is run using predictions and updates provided by Algorithm 6. Suppose that Algorithm 5 is run using predictions and updates provided by AdaNormalHedge. We start by proving a master regret bound that can be specialized to various settings of interest. Combining terms completes the proof.



Online Gradient Boosting

Neural Information Processing Systems

We extend the theory of boosting for regression problems to the online learning setting. Generalizing from the batch setting for boosting, the notion of a weak learning algorithm is modeled as an online learning algorithm with linear loss functions that competes with a base class of regression functions, while a strong learning algorithm is an online learning algorithm with smooth convex loss functions that competes with a larger class of regression functions. Our main result is an online gradient boosting algorithm that converts a weak online learning algorithm into a strong one where the larger class of functions is the linear span of the base class. We also give a simpler boosting algorithm that converts a weak online learning algorithm into a strong one where the larger class of functions is the convex hull of the base class, and prove its optimality.


DUOL: A Double Updating Approach for Online Learning

Neural Information Processing Systems

In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms.


Adaptive Market Making via Online Learning

Neural Information Processing Systems

We consider the design of strategies for \emph{market making} in a market like a stock, commodity, or currency exchange. In order to obtain profit guarantees for a market maker one typically requires very particular stochastic assumptions on the sequence of price fluctuations of the asset in question. We propose a class of spread-based market making strategies whose performance can be controlled even under worst-case (adversarial) settings. We prove structural properties of these strategies which allows us to design a master algorithm which obtains low regret relative to the best such strategy in hindsight. We run a set of experiments showing favorable performance on real-world price data.


Online learning in episodic Markovian decision processes by relative entropy policy search

Neural Information Processing Systems

We study the problem of online learning in finite episodic Markov decision processes where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space $\A$ and the state space $\X$ has a layered structure with $L$ layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after $T$ episodes is $2\sqrt{L\nX\nA T\log(\nX\nA/L)}$ in the bandit setting and $2L\sqrt{T\log(\nX\nA/L)}$ in the full information setting. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions.


Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Neural Information Processing Systems

We study the problem of online learning Markov Decision Processes (MDPs) when both the transition distributions and loss functions are chosen by an adversary. We present an algorithm that, under a mixing assumption, achieves O(\sqrt{T\log \Pi } \log \Pi) regret with respect to a comparison set of policies \Pi . The regret is independent of the size of the state and action spaces. When expectations over sample paths can be computed efficiently and the comparison set \Pi has polynomial size, this algorithm is efficient. We also consider the episodic adversarial online shortest path problem.