Agents
Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
We introduce an offline multi-agent reinforcement learning (offline MARL) framework that utilizes previously collected data without additional online data collection. Our method reformulates offline MARL as a sequence modeling problem and thus builds on top of the simplicity and scalability of the Transformer architecture. In the fashion of centralized training and decentralized execution, we propose to first train a teacher policy who has the privilege to access every agent's observations, actions, and rewards. After the teacher policy has identified and recombined the "good" behavior in the dataset, we create separate student policies and distill not only the teacher policy's features but also its structural relations among different agents' features to student policies. We show that our framework significantly improves performances on a range of tasks and outperforms state-of-the-art offline MARL baselines. Furthermore, we demonstrate that the proposed method has a better convergence rate, is more sample efficient, and is more robust to various demonstration qualities compared with baselines.
On Sample Optimality in Personalized Collaborative and Federated Learning
In personalized federated learning, each member of a potentially large set of agents aims to train a model minimizing its loss function averaged over its local data distribution. We study this problem under the lens of stochastic optimization, focusing on a scenario with a large number of agents, that each possess very few data samples from their local data distribution. Specifically, we prove novel matching lower and upper bounds on the number of samples required from all agents to approximately minimize the generalization error of a fixed agent. We provide strategies matching these lower bounds, based on a gradient filtering approach: given prior knowledge on some notion of distance between local data distributions, agents filter and aggregate stochastic gradients received from other agents, in order to achieve an optimal bias-variance trade-off. Finally, we quantify the impact of using rough estimations of the distances between local distributions of agents, based on a very small number of local samples.
An AI agent takes over a store and orders too many candles
Andon Market in San Francisco represents a vision, however flawed, of a future when more sophisticated AI agents take over work traditionally done by humans. In San Francisco's upscale Cow Hollow district, the introduction of a boutique selling coffee table games, tote bags and other household items would be pretty unremarkable. However, Andon Market has one key differentiator: It's run by AI. At this store, an artificial intelligence agent named Luna effectively acts as the chief executive officer of the operation. It decides what products to offer and how much to charge for them.
Appendix Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning
In this section, we train policies for different scenarios to validate that the tasks defined in Gigastep can be solved with multi-agent RL algorithms. In particular, we use multi-agent PPO implemented in JAX. In competitive or adversarial MARL, an objective reward measure is not defined, as the collected reward inherently depends on the relative strength of the opposing agent's policy. Therefore, to measure the training progress, we compare the current policy with previous checkpoints of the same policy at earlier training iterations. Specifically, an improving policy should be able to outperform its previous counterparts.