Planning to Give Information in Partially Observed Domains with a Learned Weighted Entropy Model

Chitnis, Rohan, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás

arXiv.org Artificial Intelligence 

In many real-world robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. Importantly, we should expect the human to have preferences about what information they are given and when they are given it. In this work, we adopt an information-theoretic view of the human's preferences: the human scores a piece of information as a function of the induced reduction in weighted entropy of their belief about the environment state. We formulate this setting as a POMDP and give a practical algorithm for solving it approximately. Then, we give an algorithm that allows the agent to sample-efficiently learn the human's preferences online. Finally, we describe an extension in which the human's preferences are time-varying. We validate our approach experimentally in two planning domains: a 2D robot mining task and a more realistic 3D robot fetching task.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found