AITopics | Kirschner, Johannes

Collaborating Authors

Kirschner, Johannes

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Confidence Estimation via Sequential Likelihood Mixing

Kirschner, Johannes, Krause, Andreas, Meziu, Michele, Mutny, Mojmir

arXiv.org Machine LearningFeb-20-2025

We present a universal framework for constructing confidence sets based on sequential likelihood mixing. Building upon classical results from sequential analysis, we provide a unifying perspective on several recent lines of work, and establish fundamental connections between sequential mixing, Bayesian inference and regret inequalities from online estimation. The framework applies to any realizable family of likelihood functions and allows for non-i.i.d. data and anytime validity. Moreover, the framework seamlessly integrates standard approximate inference techniques, such as variational inference and sampling-based methods, and extends to misspecified model classes, while preserving provable coverage guarantees. We illustrate the power of the framework by deriving tighter confidence sequences for classical settings, including sequential linear regression and sparse estimation, with simplified proofs.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2502.14689

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off

Zhang, Zichen, Kirschner, Johannes, Zhang, Junxi, Zanini, Francesco, Ayoub, Alex, Dehghan, Masood, Schuurmans, Dale

arXiv.org Machine LearningDec-14-2023

A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation. Importantly, these two errors behave differently to time discretization, leading to an optimal choice of temporal resolution for a given data budget. These findings show that managing the temporal resolution can provably improve policy evaluation efficiency in LQR systems with finite data. Empirically, we demonstrate the trade-off in numerical simulations of LQR instances and standard RL benchmarks for non-linear continuous control.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2212.08949

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Kirschner, Johannes, Lattimore, Tor, Krause, Andreas

arXiv.org Machine LearningNov-13-2023

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent results on the linear formulation of partial monitoring that naturally generalizes the standard linear bandit setting. The main result is that a single algorithm, information-directed sampling (IDS), is (nearly) worst-case rate optimal in all finite-action games. We present a simple and unified analysis of stochastic partial monitoring, and further extend the model to the contextual and kernelized setting.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2302.03683

Country:

Europe (0.28)
North America > United States (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning

Tkachuk, Volodymyr, Bakhtiari, Seyed Alireza, Kirschner, Johannes, Jusup, Matej, Bogunovic, Ilija, Szepesvári, Csaba

arXiv.org Artificial IntelligenceFeb-8-2023

A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding. For example, in cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a combinatorial blow-up in the action space by the number of agents. As a minimal requirement, we assume access to an argmax oracle that allows to efficiently compute the greedy policy for any Q-function in the model class. Building on recent work in planning with local access to a simulator and linear function approximation, we propose efficient algorithms for this setting that lead to polynomial compute and query complexity in all relevant problem parameters. For the special case where the feature decomposition is additive, we further improve the bounds and extend the results to the kernelized setting with an efficient algorithm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2302.04376

Country:

Europe (0.92)
North America > Canada (0.28)
North America > United States (0.27)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Near-optimal Policy Identification in Active Reinforcement Learning

Li, Xiang, Mehta, Viraj, Kirschner, Johannes, Char, Ian, Neiswanger, Willie, Schneider, Jeff, Krause, Andreas, Bogunovic, Ilija

arXiv.org Artificial IntelligenceDec-19-2022

Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a generative model. We propose the AE-LSVI algorithm for bestpolicy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy uniformly over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required. Reinforcement learning (RL) algorithms are increasingly applied to complex domains such as robotics (Kober et al., 2013), magnetic tokamaks (Seo et al., 2021; Degrave et al., 2022), and molecular search (Simm et al., 2020a;b). A central challenge in such environments is that data acquisition is often a time-consuming and expensive process, or may be infeasible due to safety considerations.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2212.0951

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Government > Regional Government (0.67)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bias-Robust Bayesian Optimization via Dueling Bandits

Kirschner, Johannes, Krause, Andreas

arXiv.org Machine LearningJun-9-2021

We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model. Then we propose a novel approach for dueling bandits based on information-directed sampling (IDS). Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees. Our analysis further generalizes a previously proposed semi-parametric linear bandit model to non-linear reward functions, and uncovers interesting links to doubly-robust estimation.

artificial intelligence, machine learning, optimization, (14 more...)

arXiv.org Machine Learning

2105.11802

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Asymptotically Optimal Information-Directed Sampling

Kirschner, Johannes, Lattimore, Tor, Vernade, Claire, Szepesvári, Csaba

arXiv.org Machine LearningNov-11-2020

We introduce a computationally efficient algorithm for finite stochastic linear bandits. The approach is based on the frequentist information-directed sampling (IDS) framework, with an information gain potential that is derived directly from the asymptotic regret lower bound. We establish frequentist regret bounds, which show that the proposed algorithm is both asymptotically optimal and worst-case rate optimal in finite time. Our analysis sheds light on how IDS trades off regret and information to incrementally solve the semi-infinite concave program that defines the optimal asymptotic regret. Along the way, we uncover interesting connections towards a recently proposed two-player game approach and the Bayesian IDS algorithm.

artificial intelligence, information gain, machine learning, (17 more...)

arXiv.org Machine Learning

2011.05944

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stochastic Bandits with Context Distributions

Kirschner, Johannes, Krause, Andreas

arXiv.org Machine LearningJun-6-2019

We introduce a novel stochastic contextual bandit model, where at each step the adversary chooses a distribution over a context set. The learner observes only the context distribution while the exact context realization remains hidden. This allows for a broader range of applications, for instance when the context itself is based on predictions. By leveraging the UCB algorithm to this setting, we propose an algorithm that achieves a $\tilde{\mathcal{O}}(d\sqrt{T})$ high-probability regret bound for linearly parametrized reward functions. Our results strictly generalize previous work in the sense that both our model and the algorithm reduce to the standard setting when the environment chooses only Dirac delta distributions and therefore provides the exact context to the learner. We further obtain similar results for a variant where the learner observes the realized context after choosing the action, and we extend the results to the kernelized setting. Finally, we demonstrate the proposed method on synthetic and real-world datasets.

algorithm, artificial intelligence, big data, (18 more...)

arXiv.org Machine Learning

1906.02685

Country: Europe > Switzerland (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces

Kirschner, Johannes, Mutný, Mojmír, Hiller, Nicole, Ischebeck, Rasmus, Krause, Andreas

arXiv.org Machine LearningFeb-8-2019

Bayesian optimization is known to be difficult to scale to high dimensions, because the acquisition step requires solving a non-convex optimization problem in the same search space. In order to scale the method and keep its benefits, we propose an algorithm (LineBO) that restricts the problem to a sequence of iteratively chosen one-dimensional sub-problems. We show that our algorithm converges globally and obtains a fast local rate when the function is strongly convex. Further, if the objective has an invariant subspace, our method automatically adapts to the effective dimension without changing the algorithm. Our method scales well to high dimensions and makes use of a global Gaussian process model. When combined with the SafeOpt algorithm to solve the sub-problems, we obtain the first safe Bayesian optimization algorithm with theoretical guarantees applicable in high-dimensional settings. We evaluate our method on multiple synthetic benchmarks, where we obtain competitive performance. Further, we deploy our algorithm to optimize the beam intensity of the Swiss Free Electron Laser with up to 40 parameters while satisfying safe operation constraints.

artificial intelligence, optimization, optimization problem, (18 more...)

arXiv.org Machine Learning

1902.03229

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Information-Directed Exploration for Deep Reinforcement Learning

Nikolov, Nikolay, Kirschner, Johannes, Berkenkamp, Felix, Krause, Andreas

arXiv.org Artificial IntelligenceDec-18-2018

Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.

computer game, survey article, upstream oil & gas, (20 more...)

arXiv.org Artificial Intelligence

1812.07544

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas > Upstream (0.68)
Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback