AITopics | Lattimore, Tor

Collaborating Authors

Lattimore, Tor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Contextual Information-Directed Sampling

Hao, Botao, Lattimore, Tor, Qin, Chao

arXiv.org Machine LearningJun-9-2022

Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual information is available. We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. We provably demonstrate the advantage of contextual IDS over conditional IDS and emphasize the importance of considering the context distribution. The main message is that an intelligent agent should invest more on the actions that are beneficial for the future unseen contexts while the conditional IDS can be myopic. We further propose a computationally-efficient version of contextual IDS based on Actor-Critic and evaluate it empirically on a neural network contextual bandit.

contextual information-directed sampling, machine learning, reinforcement learning, (1 more...)

arXiv.org Machine Learning

2205.10895

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Variational Bayesian Optimistic Sampling

O'Donoghue, Brendan, Lattimore, Tor

arXiv.org Machine LearningOct-29-2021

We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem with $A$ actions after $T$ rounds. We extend the regret analysis for optimistic policies to bilinear saddle-point problems which include zero-sum matrix games and constrained bandits as special cases. In this case we show that Thompson sampling can produce policies outside of the optimistic set and suffer linear regret in some instances. Finding a policy inside the optimistic set amounts to solving a convex optimization problem and we call the resulting algorithm `variational Bayesian optimistic sampling' (VBOS). The procedure works for any posteriors, \ie, it does not require the posterior to have any special properties, such as log-concavity, unimodality, or smoothness. The variational view of the problem has many useful properties, including the ability to tune the exploration-exploitation tradeoff, add regularization, incorporate constraints, and linearly parameterize the policy.

data mining, machine learning, reinforcement learning, (21 more...)

arXiv.org Machine Learning

2110.15688

Country:

Europe (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.54)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
(2 more...)

Add feedback

Bandit Phase Retrieval

Lattimore, Tor, Hao, Botao

arXiv.org Machine LearningJun-4-2021

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector. We prove that the minimax cumulative regret in this problem is $\smash{\tilde \Theta(d \sqrt{n})}$, which improves on the best known bounds by a factor of $\smash{\sqrt{d}}$. We also show that the minimax simple regret is $\smash{\tilde \Theta(d / \sqrt{n})}$ and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling are not sufficient for optimal regret.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2106.0166

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Information Directed Sampling for Sparse Linear Bandits

Hao, Botao, Lattimore, Tor, Deng, Wei

arXiv.org Machine LearningMay-29-2021

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off. We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances, demonstrating the adaptivity of IDS. To efficiently implement sparse IDS, we propose an empirical Bayesian approach for sparse posterior sampling using a spike-and-slab Gaussian-Laplace prior. Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.

artificial intelligence, bandit, bayesian inference, (17 more...)

arXiv.org Machine Learning

2105.14267

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Asymptotically Optimal Information-Directed Sampling

Kirschner, Johannes, Lattimore, Tor, Vernade, Claire, Szepesvári, Csaba

arXiv.org Machine LearningNov-11-2020

We introduce a computationally efficient algorithm for finite stochastic linear bandits. The approach is based on the frequentist information-directed sampling (IDS) framework, with an information gain potential that is derived directly from the asymptotic regret lower bound. We establish frequentist regret bounds, which show that the proposed algorithm is both asymptotically optimal and worst-case rate optimal in finite time. Our analysis sheds light on how IDS trades off regret and information to incrementally solve the semi-infinite concave program that defines the optimal asymptotic regret. Along the way, we uncover interesting connections towards a recently proposed two-player game approach and the Bayesian IDS algorithm.

artificial intelligence, information gain, machine learning, (17 more...)

arXiv.org Machine Learning

2011.05944

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

Hao, Botao, Duan, Yaqi, Lattimore, Tor, Szepesvári, Csaba, Wang, Mengdi

arXiv.org Machine LearningNov-8-2020

This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension. To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends on a divergence function measuring the distribution mismatch between the data distribution and occupancy measure of the target policy. Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds. The results suggest that having well-conditioned data is crucial for sparse batch policy learning.

artificial intelligence, probability, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2011.04019

Country:

North America (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

High-Dimensional Sparse Linear Bandits

Hao, Botao, Lattimore, Tor, Wang, Mengdi

arXiv.org Machine LearningNov-8-2020

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $\Omega(n^{2/3})$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that $\Theta(n^{2/3})$ is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free $O(\sqrt{n})$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.

artificial intelligence, health & medicine, null, (20 more...)

arXiv.org Machine Learning

2011.0402

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Gaussian Gated Linear Networks

Budden, David, Marblestone, Adam, Sezener, Eren, Lattimore, Tor, Wayne, Greg, Veness, Joel

arXiv.org Machine LearningOct-21-2020

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks. Instead of using backpropagation to learn features, GLNs have a distributed and local credit assignment mechanism based on optimizing a convex objective. This gives rise to many desirable properties including universality, data-efficient online learning, trivial interpretability and robustness to catastrophic forgetting. We extend the GLN framework from classification to multiple regression and density modelling by generalizing geometric mixing to a product of Gaussian densities. The G-GLN achieves competitive or state-of-the-art performance on several univariate and multivariate regression benchmarks, and we demonstrate its applicability to practical tasks including online contextual bandits and density estimation via denoising.

deep learning, neural network, neuron, (16 more...)

arXiv.org Machine Learning

2006.05964

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Lattimore, Tor

arXiv.org Machine LearningSep-25-2020

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

artificial intelligence, exploratory distribution, machine learning, (15 more...)

arXiv.org Machine Learning

2006.00475

Country: Europe (0.28)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Mirror Descent and the Information Ratio

Lattimore, Tor, György, András

arXiv.org Machine LearningSep-25-2020

The combination of minimax duality and the information-theoretic machinery developed by Russo and Van Roy [2014] has yielded a series of elegant arguments bounding the minimax regret for a variety of regret minimisation problems. The downside is that the application of minimax duality makes the approach non-constructive. The existence of certain policies is established without identifying what those policies are. Our main contribution is to show that the information-theoretic machinery can be translated in a natural way to the language of online linear optimisation, yielding explicit policies. Before you get too excited, these policies are not guaranteed to be efficient - they must solve a convex optimisation problem that may be infinite dimensional. Nevertheless, it provides a clear path towards algorithm design and/or improved bounds, as we illustrate with an application to finite-armed bandits. To maximise generality, our results are stated using the linear partial monitoring framework, which is flexible enough to model most classical setups.

artificial intelligence, big data, information ratio, (19 more...)

arXiv.org Machine Learning

2009.12228

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback