Atari 2600 games are deterministic given a fixed policy leading to a fixed sequence of actions. This article investigates three methods for adding randomness: random initialization, epsilon-greedy action selection, and epislon-repeat action selection. These methods are evaluated by how well they are able to derail a memorizing agent without hurting the performance of a randomized agent. Results indicate that epsilon-repeat action selection best fits the desired criteria and lower values of epsilon than previously used are sufficient to derail the memorizing agent.
This manifesto proposes a simple model of metareasoning that constitutes a general framework to organize research on this topic. The claim is that metareasoning, like the actionperception cycle of reasoning, is composed of the introspective monitoring of reasoning and the subsequent meta-level control of reasoning. This model holds for single agent and multiagent systems and is broad enough to include models of self. We offer the model as a short conversation piece to which the community can compare and contrast individual theories.
Comirit is a framework for commonsense reasoning that combines simulation, logical deduction and passive machine learning. While a passive, observation-driven approach to learning is safe and highly conservative, it is limited to inte-raction only with those objects that it has previously ob-served. In this paper we describe a preliminary exploration of methods for extending Comirit to allow safe action selection in uncertain situations, and to allow reward-maximizing selection of behaviors.
This paper explores the problem of task learning and planning, contributing the Action-Category Representation (ACR) to improve computational performance of both Planning and Reinforcement Learning (RL). ACR is an algorithm-agnostic, abstract data representation that maps objects to action categories (groups of actions), inspired by the psychological concept of action codes. We validate our approach in StarCraft and Lightworld domains; our results demonstrate several benefits of ACR relating to improved computational performance of planning and RL, by reducing the action space for the agent.