Agents
CLAI: A Platform for AI Skills on the Command Line
Agarwal, Mayank, Barroso, Jorge J., Chakraborti, Tathagata, Dow, Eli M., Fadnis, Kshitij, Godoy, Borja, Talamadupula, Kartik
This paper reports on the open source project CLAI (Command Line AI), aimed at bringing the power of AI to the command line interface. The platform sets up the CLI as a new environment for AI researchers to conquer by surfacing the command line as a generic environment that researchers can interface to using a simple sense-act API much like the traditional AI agent architecture. In this paper, we discuss the design and implementation of the platform in detail, through illustrative use cases of new end user interaction patterns enabled by this design, and through quantitative evaluation of the system footprint of a CLAI-enabled terminal. We also report on some early user feedback on its features from an internal survey.
Locally Private Distributed Reinforcement Learning
Ono, Hajime, Takahashi, Tsubasa
We study locally differentially private algorithms for reinforcement learning to obtain a robust policy that performs well across distributed private environments. Our algorithm protects the information of local agents' models from being exploited by adversarial reverse engineering. Since a local policy is strongly being affected by the individual environment, the output of the agent may release the private information unconsciously. In our proposed algorithm, local agents update the model in their environments and report noisy gradients designed to satisfy local differential privacy (LDP) that gives a rigorous local privacy guarantee. By utilizing a set of reported noisy gradients, a central aggregator updates its model and delivers it to different local agents. In our empirical evaluation, we demonstrate how our method performs well under LDP. To the best of our knowledge, this is the first work that actualizes distributed reinforcement learning under LDP. This work enables us to obtain a robust agent that performs well across distributed private environments.
Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks
Suarez, Joseph, Du, Yilun, Mordach, Igor, Isola, Phillip
Progress in multiagent intelligence research is fundamentally limited by the number and quality of environments available for study. In recent years, simulated games have become a dominant research platform within reinforcement learning, in part due to their accessibility and interpretability. Previous works have targeted and demonstrated success on arcade, first person shooter (FPS), real-time strategy (RTS), and massive online battle arena (MOBA) games. Our work considers massively multiplayer online role-playing games (MMORPGs or MMOs), which capture several complexities of real-world learning that are not well modeled by any other game genre. We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO. We further demonstrate that standard policy gradient methods and simple baseline models can learn interesting emergent exploration and specialization behaviors in this setting.
Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory
Deep reinforcement learning (RL) has achieved outstanding results in recent years, which has led a dramatic increase in the number of methods and applications. Recent works are exploring learning beyond single-agent scenarios and considering multi-agent scenarios. However, they are faced with lots of challenges and are seeking for help from traditional game-theoretic algorithms, which, in turn, show bright application promise combined with modern algorithms and boosting computing power. In this survey, we first introduce basic concepts and algorithms in single agent RL and multi-agent systems; then, we summarize the related algorithms from three aspects. Solution concepts from game theory give inspiration to algorithms which try to evaluate the agents or find better solutions in multi-agent systems. Fictitious self-play becomes popular and has a great impact on the algorithm of multi-agent reinforcement learning. Counterfactual regret minimization is an important tool to solve games with incomplete information, and has shown great strength when combined with deep learning.
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems
Papoudakis, Georgios, Albrecht, Stefano V.
Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment. In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies. Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system. By taking advantage of recent advances in unsupervised learning, we propose modeling opponents using variational autoencoders. Additionally, many existing methods in the literature assume that the opponent models have access to opponent's observations and actions during both training and execution. To eliminate this assumption, we propose a modification that attempts to identify the underlying opponent model using only local information of our agent, such as its observations, actions, and rewards. The experiments indicate that our opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method.
Fictitious Play Outperforms Counterfactual Regret Minimization
In two-player zero-sum games a Nash equilibrium strategy is guaranteed to win (or tie) in expectation against any opposing strategy by the minimax theorem. In games with m ore than two players there can be multiple equilibria with different values to the players, and follow ing one has no performance guarantee; however, it was shown that a Nash equilibrium strategy defeated a variet y of agents submitted for a class project in a 3-player imperfect-information game, Kuhn poker [13]. Thi s demonstrates that Nash equilibrium strategies can be successful in practice despite the fact that they do no t have a performance guarantee. While Nash equilibrium can be computed in polynomial time fo r two-player zero-sum games, it is PPAD-hard to compute for nonzero-sum and games with 3 or mor e agents and widely believed that no efficient algorithms exist [8, 9]. Counterfactual regret mi nimization (CFR) is an iterative self-play procedure that has been proven to converge to Nash equilibrium in two-p layer zero-sum [28].
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
We consider the problem of off-policy evaluation for reinforcement learning, where the goal is to estimate the expected reward of a target policy $\pi$ using offline data collected by running a logging policy $\mu$. Standard importance-sampling based approaches for this problem suffer from a variance that scales exponentially with time horizon $H$, which motivates a splurge of recent interest in alternatives that break the "Curse of Horizon" (Liu et al. 2018, Xie et al. 2019). In particular, it was shown that a marginalized importance sampling (MIS) approach can be used to achieve an estimation error of order $O(H^3/ n)$ in mean square error (MSE) under an episodic Markov Decision Process model with finite states and potentially infinite actions. The MSE bound however is still a factor of $H$ away from a Cramer-Rao lower bound of order $\Omega(H^2/n)$. In this paper, we prove that with a simple modification to the MIS estimator, we can asymptotically attain the Cramer-Rao lower bound, provided that the action space is finite. We also provide a general method for constructing MIS estimators with high-probability error bounds.
Facebook AI Researchers Achieve a 107x Speedup for Training Virtual Agents – NVIDIA Developer News Center
Navigating a new indoor space without any prior knowledge or even a map is a challenging task for a human, let alone a robot. To help develop intelligent machines that interact more effectively with complex 3D environments, Facebook researchers developed a GPU-accelerated deep reinforcement learning model that achieves near 100 percent success in navigating a variety of virtual environments without a pre-provided map. To achieve this breakthrough, the team focused their work on developing an efficient approach to scaling RL models, which require a significant number of training samples, using multi-node distribution. "A single parameter server and thousands of (typically CPU) workers may be fundamentally incompatible with the needs of modern computer vision and robotics communities," the researchers explained in their post, Near-perfect point-goal navigation from 2.5 billion frames of experience. "Unlike Gym or Atari, 3D simulators require GPU acceleration…. The desired agents operate from high-dimensional inputs (pixels) and use deep networks, such as ResNet50, which strain the parameter server. Thus, existing distributed RL architectures do not scale and there is a need to develop a new distributed architecture."
Meet Neo the Intelligent Agent
Now we bring you Neo, the intelligent and autonomous agent, designed with the eight fundamentals in mind, and running on One Network's Real Time Value Network platform. Neo transcends silos to consider all business partners, all systems and all relevant constraints across the network, for more effective decisions. He provides decision support and can run autonomously, executing his own recommendations if you want him to and when it's appropriate. Learn how Neo can lift sell through and improve service levels, while lowering costs throughout your network. Fill out the form on the right and we'll send it to you now, along with access to our entire library of business transforming insights.
Bounded Incentives in Manipulating the Probabilistic Serial Rule
Wang, Zihe, Wei, Zhide, Zhang, Jie
The Probabilistic Serial mechanism is well-known for its desirable fairness and efficiency properties. It is one of the most prominent protocols for the random assignment problem. However, Probabilistic Serial is not incentive-compatible, thereby these desirable properties only hold for the agents' declared preferences, rather than their genuine preferences. A substantial utility gain through strategic behaviors would trigger self-interested agents to manipulate the mechanism and would subvert the very foundation of adopting the mechanism in practice. In this paper, we characterize the extent to which an individual agent can increase its utility by strategic manipulation. We show that the incentive ratio of the mechanism is $\frac{3}{2}$. That is, no agent can misreport its preferences such that its utility becomes more than 1.5 times of what it is when reports truthfully. This ratio is a worst-case guarantee by allowing an agent to have complete information about other agents' reports and to figure out the best response strategy even if it is computationally intractable in general. To complement this worst-case study, we further evaluate an agent's utility gain on average by experiments. The experiments show that an agent' incentive in manipulating the rule is very limited. These results shed some light on the robustness of Probabilistic Serial against strategic manipulation, which is one step further than knowing that it is not incentive-compatible.