glib
Guided Exploration for Efficient Relational Model Learning
Feng, Annie, Kumar, Nishanth, Lozano-Perez, Tomas, Pack-Kaelbling, Leslie
Efficient exploration is critical for learning relational models in large-scale environments with complex, long-horizon tasks. Random exploration methods often collect redundant or irrelevant data, limiting their ability to learn accurate relational models of the environment. Goal-literal babbling (GLIB) improves upon random exploration by setting and planning to novel goals, but its reliance on random actions and random novel goal selection limits its scalability to larger domains. In this work, we identify the principles underlying efficient exploration in relational domains: (1) operator initialization with demonstrations that cover the distinct lifted effects necessary for planning and (2) refining preconditions to collect maximally informative transitions by selecting informative goal-action pairs and executing plans to them. To demonstrate these principles, we introduce Baking-Large, a challenging domain with extensive state-action spaces and long-horizon tasks. We evaluate methods using oracle-driven demonstrations for operator initialization and precondition-targeting guidance to efficiently gather critical transitions. Experiments show that both the oracle demonstrations and precondition-targeting oracle guidance significantly improve sample efficiency and generalization, paving the way for future methods to use these principles to efficiently learn accurate relational models in complex domains.
GLIB: Exploration via Goal-Literal Babbling for Lifted Operator Learning
Chitnis, Rohan, Silver, Tom, Tenenbaum, Joshua, Kaelbling, Leslie Pack, Lozano-Perez, Tomas
We address the problem of efficient exploration for learning lifted operators in sequential decision-making problems without extrinsic goals or rewards. Inspired by human curiosity, we propose goal-literal babbling (GLIB), a simple and general method for exploration in such problems. GLIB samples goals that are conjunctions of literals, which can be understood as specific, targeted effects that the agent would like to achieve in the world, and plans to achieve these goals using the operators being learned. We conduct a case study to elucidate two key benefits of GLIB: robustness to overly general preconditions and efficient exploration in domains with effects at long horizons. We also provide theoretical guarantees and further empirical results, finding GLIB to be effective on a range of benchmark planning tasks.