AITopics | Lattimore, Tor

Collaborating Authors

Lattimore, Tor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online Newton Method for Bandit Convex Optimisation

Fokkema, Hidde, van der Hoeven, Dirk, Lattimore, Tor, Mayo, Jack J.

arXiv.org Machine LearningJun-10-2024

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ where $M \in [d^{-1/2}, d^{-1 / 4}]$ is a constant that depends on the geometry of the constraint set and the desired computational properties.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

2406.06506

Country: Europe > Netherlands (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.42)

Add feedback

Bandit Convex Optimisation

Lattimore, Tor

arXiv.org Artificial IntelligenceFeb-9-2024

Bandit problems are most commonly thought of in terms of sequential decision making or as an elementary version of reinforcement learning. This is certainly true, but they are also intimately connected with optimisation. These notes focus on the convex bandit problem, where the set of actions is a convex subset of euclidean space and the function mapping actions to losses is convex. The learner chooses actions and observes the corresponding loss, possibly with additive noise. The duty of the learner is to minimise the loss. When phrased like this the problem seems more like zeroth-order optimisation and these notes largely take that viewpoint. We do borrow two key notions from the bandit community that give the algorithms and analysis a unique flavour: We focus on the cumulative regret as a performance criterion; and We (mostly) consider the adversarial setting, where the loss function is changing from one query to the next. These differences mean the algorithms and theorems presented here are often not directly comparable to standard settings in the optimisation literature. Generally speaking, the theorems presented here hold with fewer assumptions and consequentially are sometimes weaker.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2402.06535

Country:

North America > United States (0.14)
Europe > France (0.14)
Asia > Middle East (0.14)

Genre:

Summary/Review (0.92)
Research Report (0.81)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Context-lumpable stochastic bandits

Lee, Chung-Wei, Liu, Qinghua, Abbasi-Yadkori, Yasin, Jin, Chi, Lattimore, Tor, Szepesvári, Csaba

arXiv.org Artificial IntelligenceNov-27-2023

We consider a contextual bandit problem with $S$ contexts and $K$ actions. In each round $t=1,2,\dots$, the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r\le \min\{S,K\}$ groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an $\epsilon$-optimal policy after using at most $\widetilde O(r (S +K )/\epsilon^2)$ samples with high probability and provide a matching $\Omega(r(S+K)/\epsilon^2)$ lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $\widetilde O(\sqrt{r^3(S+K)T})$. To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and $\widetilde O(\sqrt{{poly}(r)(S+K)T})$ minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.

bandit, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.13053

Country:

North America > United States > California (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

Probabilistic Inference in Reinforcement Learning Done Right

Tarbouriech, Jean, Lattimore, Tor, O'Donoghue, Brendan

arXiv.org Artificial IntelligenceNov-22-2023

A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2311.13294

Country: North America > United States (0.14)

Genre: Research Report (0.81)

Add feedback

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Kirschner, Johannes, Lattimore, Tor, Krause, Andreas

arXiv.org Machine LearningNov-13-2023

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent results on the linear formulation of partial monitoring that naturally generalizes the standard linear bandit setting. The main result is that a single algorithm, information-directed sampling (IDS), is (nearly) worst-case rate optimal in all finite-action games. We present a simple and unified analysis of stochastic partial monitoring, and further extend the model to the contextual and kernelized setting.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2302.03683

Country:

North America (0.45)
Europe (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Leveraging Demonstrations to Improve Online Learning: Quality Matters

Hao, Botao, Jain, Rahul, Lattimore, Tor, Van Roy, Benjamin, Wen, Zheng

arXiv.org Artificial IntelligenceMay-17-2023

We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning algorithm and model. The demonstration data is generated by an expert with a given competence level, a notion we introduce. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule and derive a prior-dependent Bayesian regret bound. This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level. We also develop a practical, approximate informed TS algorithm through Bayesian bootstrapping and show substantial empirical regret reduction through experiments.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.03319

Country: North America > United States (0.67)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.84)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Sequential Best-Arm Identification with Application to Brain-Computer Interface

Zhou, Xin, Hao, Botao, Kang, Jian, Lattimore, Tor, Li, Lexin

arXiv.org Artificial IntelligenceMay-17-2023

A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system. It allows individuals to interact with the device using only their thoughts, and holds immense potential for a wide range of applications in medicine, rehabilitation, and human augmentation. An electroencephalogram (EEG) and event-related potential (ERP)- based speller system is a type of BCI that allows users to spell words without using a physical keyboard, but instead by recording and interpreting brain signals under different stimulus presentation paradigms. Conventional non-adaptive paradigms treat each word selection independently, leading to a lengthy learning process. To improve the sampling efficiency, we cast the problem as a sequence of best-arm identification tasks in multi-armed bandits. Leveraging pre-trained large language models (LLMs), we utilize the prior knowledge learned from previous tasks to inform and facilitate subsequent tasks. To do so in a coherent way, we propose a sequential top-two Thompson sampling (STTS) algorithm under the fixed-confidence setting and the fixed-budget setting. We study the theoretical property of the proposed algorithm, and demonstrate its substantial empirical improvement through both synthetic data analysis as well as a P300 BCI speller simulator example.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.11908

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
(2 more...)

Add feedback

A Second-Order Method for Stochastic Bandit Convex Optimisation

Lattimore, Tor, György, András

arXiv.org Artificial IntelligenceFeb-10-2023

The main contribution is the following regret guarantee for a simple algorithm for which the computation per round is dominated by finding the eigendecomposition of a d d matrix.

artificial intelligence, machine learning, probability, (19 more...)

arXiv.org Artificial Intelligence

2302.05371

Country: Europe (0.28)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

Amani, Sanae, Lattimore, Tor, György, András, Yang, Lin F.

arXiv.org Artificial IntelligenceDec-7-2022

We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $\Omega(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. We then propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that the communication cost of DisBE-LUCB matches our lower bound up to logarithmic factors. In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. We also provide similar bounds for practical settings where the context distribution can only be estimated. Therefore, our proposed algorithm is nearly minimax optimal in terms of \emph{both regret and communication cost}. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their \emph{immediate neighbors} through a carefully designed consensus procedure.

artificial intelligence, communication cost, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2205.1317

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Regret Bounds for Information-Directed Reinforcement Learning

Hao, Botao, Lattimore, Tor

arXiv.org Artificial IntelligenceNov-24-2022

Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for Markov Decision Processes (MDPs) is still limited. We develop novel information-theoretic tools to bound the information ratio and cumulative information gain about the learning target. Our theoretical results shed light on the importance of choosing the learning target such that the practitioners can balance the computation and regret bounds. As a consequence, we derive prior-free Bayesian regret bounds for vanilla-IDS which learns the whole environment under tabular finite-horizon MDPs. In addition, we propose a computationally-efficient regularized-IDS that maximizes an additive form rather than the ratio form and show that it enjoys the same regret bound as vanilla-IDS. With the aid of rate-distortion theory, we improve the regret bound by learning a surrogate, less informative environment. Furthermore, we extend our analysis to linear MDPs and prove similar regret bounds for Thompson sampling as a by-product.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2206.0464

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Add feedback