AITopics

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.15)

Industry: Education > Educational Setting (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-31-2016

Refined Lower Bounds for Adversarial Bandits

Gerchinovitz, Sébastien, Lattimore, Tor

We provide new lower bounds on the regret that must be suffered by adversarial bandit algorithms. The new results show that recent upper bounds that either (a) hold with high-probability or (b) depend on the total loss of the best arm or (c) depend on the quadratic variation of the losses, are close to tight. Besides this we prove two impossibility results. First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case. Second, the regret cannot scale with the effective range of the losses. In contrast, both results are possible in the full-information setting.

algorithm, artificial intelligence, big data, (16 more...)

Country:

Europe (0.68)
North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Data Science > Data Mining > Big Data (0.69)

arXiv.org Machine LearningOct-14-2016

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Lattimore, Tor, Szepesvari, Csaba

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the optimism principle and Thompson sampling. While prior work has mostly been in the worst-case setting, we analyse the asymptotic instance-dependent regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate, and indeed, can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation. For example, for generalised linear bandits and reinforcement learning.

algorithm, artificial intelligence, machine learning, (18 more...)

1610.04491

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-10-2016

Causal Bandits: Learning Good Interventions via Causal Inference

Lattimore, Finnian, Lattimore, Tor, Reid, Mark D.

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.

algorithm, artificial intelligence, big data, (19 more...)

1606.03203

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Data Science > Data Mining > Big Data (0.52)

arXiv.org Artificial IntelligenceJun-3-2016

Thompson Sampling is Asymptotically Optimal in General Environments

Leike, Jan, Lattimore, Tor, Orseau, Laurent, Hutter, Marcus

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

artificial intelligence, reinforcement learning, thompson, (17 more...)

arXiv.org Artificial Intelligence

1602.07905

Country: North America > Canada > Alberta (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)

arXiv.org Machine LearningMay-27-2016

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

Lattimore, Tor

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss some computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is a modest improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

big data, gittin index, health & medicine, (20 more...)

1511.06014

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.55)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningMay-6-2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

Lattimore, Tor

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in hindsight) and comes with the strongest finite-time regret guarantees for a horizon-free algorithm so far. I also show a finite-time lower bound that nearly matches the upper bound.

algorithm, artificial intelligence, health & medicine, (15 more...)

1603.08661

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

arXiv.org Machine LearningFeb-12-2016

Conservative Bandits

Wu, Yifan, Shariff, Roshan, Lattimore, Tor, Szepesvári, Csaba

We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies to maximize revenue whilst simultaneously maintaining their revenue above a fixed baseline, uniformly over time. While previous work addressed the problem under the weaker requirement of maintaining the revenue constraint only at a given fixed time in the future, the algorithms previously proposed are unsuitable due to their design under the more stringent constraints. We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints. Amongst other things, we prove both high probability and expectation bounds on the regret, while we also consider both the problem of maintaining the constraints with high probability or expectation. For the adversarial setting the price of maintaining the constraint appears to be higher, at least for the algorithm considered. A lower bound is given showing that the algorithm for the stochastic setting is almost optimal. Empirical results obtained in synthetic environments complement our theoretical findings.

artificial intelligence, big data, constraint, (17 more...)

1602.04282

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-31-2015

The Pareto Regret Frontier for Bandits

Lattimore, Tor

Given a multi-armed bandit problem it may be desirable to achieve a smaller-than-usual worst-case regret for some special actions. I show that the price for such unbalanced worst-case regret guarantees is rather high. Specifically, if an algorithm enjoys a worst-case regret of B with respect to some action, then there must exist another action for which the worst-case regret is at least Ω(nK/B), where n is the horizon and K the number of actions. I also give upper bounds in both the stochastic and adversarial settings showing that this result cannot be improved. For the stochastic case the pareto regret frontier is characterised exactly up to constant factors.

artificial intelligence, big data, worst-case regret, (15 more...)

Country: North America > Canada > Alberta (0.14)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsDec-31-2015

Linear Multi-Resource Allocation with Semi-Bandit Feedback

Lattimore, Tor, Crammer, Koby, Szepesvari, Csaba

We study an idealised sequential resource allocation problem. In each time step the learner chooses an allocation of several resource types between a number of tasks. Assigning more resources to a task increases the probability that it is completed. The problem is challenging because the alignment of the tasks to the resource types is unknown and the feedback is noisy. Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis. Along the way we draw connections to the problem of minimising regret for stochastic linear bandits with heteroscedastic noise. We also present some new results for stochastic linear bandits on the hypercube that significantly out-performs existing work, especially in the sparse case.

algorithm, artificial intelligence, linear bandit, (14 more...)

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)