This manifesto proposes a simple model of metareasoning that constitutes a general framework to organize research on this topic. The claim is that metareasoning, like the actionperception cycle of reasoning, is composed of the introspective monitoring of reasoning and the subsequent meta-level control of reasoning. This model holds for single agent and multiagent systems and is broad enough to include models of self. We offer the model as a short conversation piece to which the community can compare and contrast individual theories.
This paper explores the problem of task learning and planning, contributing the Action-Category Representation (ACR) to improve computational performance of both Planning and Reinforcement Learning (RL). ACR is an algorithm-agnostic, abstract data representation that maps objects to action categories (groups of actions), inspired by the psychological concept of action codes. We validate our approach in StarCraft and Lightworld domains; our results demonstrate several benefits of ACR relating to improved computational performance of planning and RL, by reducing the action space for the agent.
We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multiagent systems. Specifically, we focus on a novel action selection strategy for Q-learning (Watkins 1989). The new technique is applicable to scenarios where mutual observation of actions is not possible. To date, reinforcement learning approaches for such independent agents did not guarantee convergence to the optimal joint action in scenarios with high miscoordination costs. We improve on previous results (Claus & Boutilier 1998) by demonstrating empirically that our extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.