Agents
Learning Affordance Landscapes for Interaction Exploration in 3D Environments
Embodied agents operating in human spaces must be able to master how their environment works: what objects can the agent use, and how can it use them? We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen).
14cfdb59b5bda1fc245aadae15b1984a-AuthorFeedback.pdf
We thank the reviewers for their insightful comments. We will incorporate the feedback and suggestions into the next revision of the paper. A: The messages exchanged between the agents generally convey agent status information (location, health status, etc.) Overtime, communication level gradually decreases as agents move to the right position (step 250,430). We can also design similar experiments to infer the meaning of other types of messages. A: VBC is most beneficial to multi-agent systems that require quick decision making and low communication overhead.
Discovery of Useful Questions as Auxiliary Tasks
Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh
Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value functions or GVFs, a fairly rich form of knowledge representation. Specifically, our method uses non-myopic meta-gradients to learn GVF-questions such that learning answers to them, as an auxiliary task, induces useful representations for the main task faced by the RL agent. We demonstrate that auxiliary tasks based on the discovered GVFs are sufficient, on their own, to build representations that support main task learning, and that they do so better than popular hand-designed auxiliary tasks from the literature. Furthermore, we show, in the context of Atari 2600 videogames, how such auxiliary tasks, meta-learned alongside the main task, can improve the data efficiency of an actor-critic agent.
Market Scoring Rules Act As Opinion Pools For Risk-Averse Agents
Mithun Chakraborty, Sanmay Das
A market scoring rule (MSR) - a popular tool for designing algorithmic prediction markets - is an incentive-compatible mechanism for the aggregation of probabilistic beliefs from myopic risk-neutral agents. In this paper, we add to a growing body of research aimed at understanding the precise manner in which the price process induced by a MSR incorporates private information from agents who deviate from the assumption of risk-neutrality. We first establish that, for a myopic trading agent with a risk-averse utility function, a MSR satisfying mild regularity conditions elicits the agent's risk-neutral probability conditional on the latest market state rather than her true subjective probability. Hence, we show that a MSR under these conditions effectively behaves like a more traditional method of belief aggregation, namely an opinion pool, for agents' true probabilities.