Learning Management
Online Learning under Delayed Feedback
Joulani, Pooria, György, András, Szepesvári, Csaba
Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.
Online Learning with Switching Costs and Other Adaptive Adversaries
Cesa-Bianchi, Nicolo, Dekel, Ofer, Shamir, Ohad
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is $\widetilde{\Theta}(T^{2/3})$. Interestingly, this rate is significantly worse than the $\Theta(\sqrt{T})$ rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force $\widetilde{\Theta}(T^{2/3})$ regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies.
Normalized Online Learning
Ross, Stephane, Mineiro, Paul, Langford, John
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.
Online Learning in a Contract Selection Problem
In an online contract selection problem there is a seller which offers a set of contracts to sequentially arriving buyers whose types are drawn from an unknown distribution. If there exists a profitable contract for the buyer in the offered set, i.e., a contract with payoff higher than the payoff of not accepting any contracts, the buyer chooses the contract that maximizes its payoff. In this paper we consider the online contract selection problem to maximize the sellers profit. Assuming that a structural property called ordered preferences holds for the buyer's payoff function, we propose online learning algorithms that have sub-linear regret with respect to the best set of contracts given the distribution over the buyer's type. This problem has many applications including spectrum contracts, wireless service provider data plans and recommendation systems.
On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions
Kar, Purushottam, Sriperumbudur, Bharath K, Jain, Prateek, Karnick, Harish C
In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to provide Rademacher complexity-based generalization error bounds. Our bounds are in general tighter than those obtained by Wang et al (COLT 2012) for the same problem. Using our decoupling technique, we are further able to obtain fast convergence rates for strongly convex pairwise loss functions. We are also able to analyze a class of memory efficient online learning algorithms for pairwise learning problems that use only a bounded subset of past training samples to update the hypothesis at each step. Finally, in order to complement our generalization bounds, we propose a novel memory efficient online learning algorithm for higher order learning problems with bounded regret guarantees.
Towards Pareto Descent Directions in Sampling Experts for Multiple Tasks in an On-Line Learning Paradigm
Ghosh, Shaona (University of Southampton,UK) | Lovell, Chris (University of Southampton) | Gunn, Steve R. (University of Southampton)
In many real-life design problems, there is a requirement to simultaneously balance multiple tasks or objectives in the system that are conflicting in nature, where minimizing one objective causes another to increase in value, thereby resulting in trade-offs between the objectives. For example, in embedded multi-core mobile devices and very large scale data centers, there is a continuous problem of simultaneously balancing interfering goals of maximal power savings and minimal performance delay with varying trade-off values for different application workloads executing on them. Typically, the optimal trade-offs for the executing workloads, lie on a difficult to determine optimal Pareto front. The nature of the problem requires learning over the lifetime of the mobile device or server with continuous evaluation and prediction of the trade-off settings on the system that balances the interfering objectives optimally. Towards this, we propose an on-line learning method, where the weights of experts for addressing the objectives are updated based on a convex combination of their relative performance in addressing all objectives simultaneously. An additional importance vector that assigns relative importance to each objective at every round is used, and is sampled from a convex cone pointed at the origin Our preliminary results show that the convex combination of the importance vector and the gradient of the potential functions of the learner's regret with respect to each objective ensure that in the next round, the drift (instantaneous regret vector), is the Pareto descent direction that enables better convergence to the optimal Pareto front.
Towards Pareto Descent Directions in Sampling Experts for Multiple Tasks in an On-Line Learning Paradigm
Ghosh, Shaona (University of Southampton,UK) | Lovell, Chris (University of Southampton) | Gunn, Steve R. (University of Southampton)
In many real-life design problems, there is a requirement to simultaneously balance multiple tasks or objectives in the system that are conflicting in nature, where minimizing one objective causes another to increase in value, thereby resulting in trade-offs between the objectives. For example, in embedded multi-core mobile devices and very large scale data centers, there is a continuous problem of simultaneously balancing interfering goals of maximal power savings and minimal performance delay with varying trade-off values for different application workloads executing on them. Typically, the optimal trade-offs for the executing workloads, lie on a difficult to determine optimal Pareto front. The nature of the problem requires learning over the lifetime of the mobile device or server with continuous evaluation and prediction of the trade-off settings on the system that balances the interfering objectives optimally. Towards this, we propose an on-line learning method, where the weights of experts for addressing the objectives are updated based on a convex combination of their relative performance in addressing all objectives simultaneously. An additional importance vector that assigns relative importance to each objective at every round is used, and is sampled from a convex cone pointed at the origin Our preliminary results show that the convex combination of the importance vector and the gradient of the potential functions of the learner's regret with respect to each objective ensure that in the next round, the drift (instantaneous regret vector), is the Pareto descent direction that enables better convergence to the optimal Pareto front.
Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
Abbasi-Yadkori, Yasin, Bartlett, Peter L., Szepesvari, Csaba
We study the problem of learning Markov decision processes with finite state and action spaces when the transition probability distributions and loss functions are chosen adversarially and are allowed to change with time. We introduce an algorithm whose regret with respect to any policy in a comparison class grows as the square root of the number of rounds of the game, provided the transition probabilities satisfy a uniform mixing condition. Our approach is efficient as long as the comparison class is polynomial and we can compute expectations over sample paths for each policy. Designing an efficient algorithm with small regret for the general case remains an open problem.
Online Learning with Pairwise Loss Functions
Wang, Yuyang, Khardon, Roni, Pechyony, Dmitry, Jones, Rosie
Efficient online learning with pairwise loss functions is a crucial component in building large-scale learning system that maximizes the area under the Receiver Operator Characteristic (ROC) curve. In this paper we investigate the generalization performance of online learning algorithms with pairwise loss functions. We show that the existing proof techniques for generalization bounds of online algorithms with a univariate loss can not be directly applied to pairwise losses. In this paper, we derive the first result providing data-dependent bounds for the average risk of the sequence of hypotheses generated by an arbitrary online learner in terms of an easily computable statistic, and show how to extract a low risk hypothesis from the sequence. We demonstrate the generality of our results by applying it to two important problems in machine learning. First, we analyze two online algorithms for bipartite ranking; one being a natural extension of the perceptron algorithm and the other using online convex optimization. Secondly, we provide an analysis for the risk bound for an online algorithm for supervised metric learning.
Confusion-Based Online Learning and a Passive-Aggressive Scheme
This paper provides the first ---to the best of our knowledge--- analysis of online learning algorithms for multiclass problems when the {\em confusion} matrix is taken as a performance measure. The work builds upon recent and elegant results on noncommutative concentration inequalities, i.e. concentration inequalities that apply to matrices, and more precisely to matrix martingales. We do establish generalization bounds for online learning algorithm and show how the theoretical study motivate the proposition of a new confusion-friendly learning procedure. This learning algorithm, called \copa (for COnfusion Passive-Aggressive) is a passive-aggressive learning algorithm; it is shown that the update equations for \copa can be computed analytically, thus allowing the user from having to recours to any optimization package to implement it.