Leonard, Naomi Ehrich
Distributed Bandits: Probabilistic Communication on $d$-regular Graphs
Madhushani, Udari, Leonard, Naomi Ehrich
We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.
Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control
Zhong, Yaofeng Desmond, Leonard, Naomi Ehrich
Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers.
On Distributed Cooperative Decision-Making in Multiarmed Bandits
Landgren, Peter, Srivastava, Vaibhav, Leonard, Naomi Ehrich
We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group.
Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis
Srivastava, Vaibhav, Reverdy, Paul, Leonard, Naomi Ehrich
We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms. Our results show how priors and correlation structure can be leveraged to improve performance.