Agents
Towards Model Agnostic Federated Learning Using Knowledge Distillation
Afonin, Andrei, Karimireddy, Sai Praneeth
An often unquestioned assumption underlying most current federated learning algorithms is that all the participants use identical model architectures. In this work, we initiate a theoretical study of model agnostic communication protocols which would allow data holders (agents) using different models to collaborate with each other and perform federated learning. We focus on the setting where the two agents are attempting to perform kernel regression using different kernels (and hence have different models). Our study yields a surprising result -- the most natural algorithm of using alternating knowledge distillation (AKD) imposes overly strong regularization and may lead to severe under-fitting. Our theory also shows an interesting connection between AKD and the alternating projection algorithm for finding intersection of sets. Leveraging this connection, we propose a new algorithms which improve upon AKD. Our theoretical predictions also closely match real world experiments using neural networks. Thus, our work proposes a rich yet tractable framework for analyzing and developing new practical model agnostic federated learning algorithms.
Universal Decision Models
Humans are universal decision makers: we reason causally to understand the world; we act competitively to gain advantage in commerce, games, and war; and we are able to learn to make better decisions through trial and error. In this paper, we propose Universal Decision Model (UDM), a mathematical formalism based on category theory. Decision objects in a UDM correspond to instances of decision tasks, ranging from causal models and dynamical systems such as Markov decision processes and predictive state representations, to network multiplayer games and Witsenhausen's intrinsic models, which generalizes all these previous formalisms. A UDM is a category of objects, which include decision objects, observation objects, and solution objects. Bisimulation morphisms map between decision objects that capture structure-preserving abstractions. We formulate universal properties of UDMs, including information integration, decision solvability, and hierarchical abstraction. We describe universal functorial representations of UDMs, and propose an algorithm for computing the minimal object in a UDM using algebraic topology. We sketch out an application of UDMs to causal inference in network economics, using a complex multiplayer producer-consumer two-sided marketplace.
Learning to Ground Multi-Agent Communication with Autoencoders
Lin, Toru, Huh, Minyoung, Stauffer, Chris, Lim, Ser-Nam, Isola, Phillip
Communication requires having a common language, a lingua franca, between agents. This language could emerge via a consensus process, but it may require many generations of trial and error. Alternatively, the lingua franca can be given by the environment, where agents ground their language in representations of the observed world. We demonstrate a simple way to ground language in learned representations, which facilitates decentralized multi-agent communication and coordination. We find that a standard representation learning algorithm -- autoencoding -- is sufficient for arriving at a grounded common language. When agents broadcast these representations, they learn to understand and respond to each other's utterances and achieve surprisingly strong task performance across a variety of multi-agent communication environments.
Learning to Control using Image Feedback
Raghavan, Krishnan, Narayanan, Vignesh, Saraangapani, Jagannathan
Learning to control complex systems using non-traditional feedback, e.g., in the form of snapshot images, is an important task encountered in diverse domains such as robotics, neuroscience, and biology (cellular systems). In this paper, we present a two neural-network (NN)-based feedback control framework to design control policies for systems that generate feedback in the form of images. In particular, we develop a deep $Q$-network (DQN)-driven learning control strategy to synthesize a sequence of control inputs from snapshot images that encode the information pertaining to the current state and control action of the system. Further, to train the networks we employ a direct error-driven learning (EDL) approach that utilizes a set of linear transformations of the NN training error to update the NN weights in each layer. We verify the efficacy of the proposed control strategy using numerical examples.
V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL
Jin, Chi, Liu, Qinghua, Wang, Yuanhao, Yu, Tiancheng
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms -- V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i^{\rm th}$ player. This is in sharp contrast to the size of the joint action space which is $\prod_{i=1}^m A_i$. V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into a RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.
ABIDES-Gym: Gym Environments for Multi-Agent Discrete Event Simulation and Application to Financial Markets
Amrouni, Selim, Moulin, Aymeric, Vann, Jared, Vyetrenko, Svitlana, Balch, Tucker, Veloso, Manuela
Model-free Reinforcement Learning (RL) requires the ability to sample trajectories by taking actions in the original problem environment or a simulated version of it. Breakthroughs in the field of RL have been largely facilitated by the development of dedicated open source simulators with easy to use frameworks such as OpenAI Gym and its Atari environments. In this paper we propose to use the OpenAI Gym framework on discrete event time based Discrete Event Multi-Agent Simulation (DEMAS). We introduce a general technique to wrap a DEMAS simulator into the Gym framework. We expose the technique in detail and implement it using the simulator ABIDES as a base. We apply this work by specifically using the markets extension of ABIDES, ABIDES-Markets, and develop two benchmark financial markets OpenAI Gym environments for training daily investor and execution agents. As a result, these two environments describe classic financial problems with a complex interactive market behavior response to the experimental agent's action.
Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance
Lim, Justin, Ji, Christina X, Oberst, Michael, Blecker, Saul, Horwitz, Leora, Sontag, David
Individuals often make different decisions when faced with the same context, due to personal preferences and background. For instance, judges may vary in their leniency towards certain drug-related offenses, and doctors may vary in their preference for how to start treatment for certain types of patients. With these examples in mind, we present an algorithm for identifying types of contexts (e.g., types of cases or patients) with high inter-decision-maker disagreement. We formalize this as a causal inference problem, seeking a region where the assignment of decision-maker has a large causal effect on the decision. Our algorithm finds such a region by maximizing an empirical objective, and we give a generalization bound for its performance. In a semi-synthetic experiment, we show that our algorithm recovers the correct region of heterogeneity accurately compared to baselines. Finally, we apply our algorithm to real-world healthcare datasets, recovering variation that aligns with existing clinical knowledge.
Dynamic population-based meta-learning for multi-agent communication with natural language
Gupta, Abhinav, Lanctot, Marc, Lazaridou, Angeliki
In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language. Previous work using a single set of agents has shown great progress in generalizing to known partners, however it struggles when coordinating with unfamiliar agents. To mitigate that, recent work explored the use of population-based approaches, where multiple agents interact with each other with the goal of learning more generic protocols. These methods, while able to result in good coordination between unseen partners, still only achieve so in cases of simple languages, thus failing to adapt to human partners using natural language. We attribute this to the use of static populations and instead propose a dynamic population-based meta-learning approach that builds such a population in an iterative manner. We perform a holistic evaluation of our method on two different referential games, and show that our agents outperform all prior work when communicating with seen partners and humans. Furthermore, we analyze the natural language generation skills of our agents, where we find that our agents also outperform strong baselines. Finally, we test the robustness of our agents when communicating with out-of-population agents and carefully test the importance of each component of our method through ablation studies.
(Almost) Free Incentivized Exploration from Decentralized Learning Agents
Shi, Chengshuai, Xu, Haifeng, Xiong, Wei, Shen, Cong
Incentivized exploration in multi-armed bandits (MAB) has witnessed increasing interests and many progresses in recent years, where a principal offers bonuses to agents to do explorations on her behalf. However, almost all existing studies are confined to temporary myopic agents. In this work, we break this barrier and study incentivized exploration with multiple and long-term strategic agents, who have more complicated behaviors that often appear in real-world applications. An important observation of this work is that strategic agents' intrinsic needs of learning benefit (instead of harming) the principal's explorations by providing "free pulls". Moreover, it turns out that increasing the population of agents significantly lowers the principal's burden of incentivizing. The key and somewhat surprising insight revealed from our results is that when there are sufficiently many learning agents involved, the exploration process of the principal can be (almost) free. Our main results are built upon three novel components which may be of independent interest: (1) a simple yet provably effective incentive-provision strategy; (2) a carefully crafted best arm identification algorithm for rewards aggregated under unequal confidences; (3) a high-probability finite-time lower bound of UCB algorithms. Experimental results are provided to complement the theoretical analysis.
Stanford reinforcement learning system simulates evolution
This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. Hundreds of millions of years of evolution have blessed our planet with a wide variety of lifeforms, each intelligent in its own fashion. Each species has evolved to develop innate skills, learning capacities, and a physical form that ensure its survival in its environment. But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and fusing them together after development. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest lifeforms.