Goto

Collaborating Authors

 cee-us


SupplementaryMaterialfor CuriousExplorationviaStructuredWorldModelsYields Zero-ShotObjectManipulation AGNNArchitecturalDetails

Neural Information Processing Systems

When different object types are present in the environment, we also include static object features in the object statessit. This can be viewed as a concatenation of a dynamicandastaticgraph[44]. Fortheextrinsic phase, we take the learned model with the listed architectural settings to solve downstream tasks zero-shot. In both environments 2000 transitions are generated within one trainingiterationofCEE-US. The actuated agent, i.e. robot, state is given bysagent.






Regularity as Intrinsic Reward for Free Play

Sancaktar, Cansu, Piater, Justus, Martius, Georg

arXiv.org Artificial Intelligence

We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multiobject robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks. Code and videos are available at https://sites.google.com/view/rair-project. Figure 1: Regularity as intrinsic reward yields ordered and symmetric patterns.


Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Sancaktar, Cansu, Blaes, Sebastian, Martius, Georg

arXiv.org Artificial Intelligence

It has been a long-standing dream to design artificial agents that explore their environment efficiently via intrinsic motivation, similar to how children perform curious free play. Despite recent advances in intrinsically motivated reinforcement learning (RL), sample-efficient exploration in object manipulation scenarios remains a significant challenge as most of the relevant information lies in the sparse agent-object and object-object interactions. In this paper, we propose to use structured world models to incorporate relational inductive biases in the control loop to achieve sample-efficient and interaction-rich exploration in compositional multi-object environments. By planning for future novelty inside structured world models, our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time. Instead of using models only to compute intrinsic rewards, as commonly done, our method showcases that the self-reinforcing cycle between good models and good exploration also opens up another avenue: zero-shot generalization to downstream tasks via model-based planning. After the entirely intrinsic task-agnostic exploration phase, our method solves challenging downstream tasks such as stacking, flipping, pick & place, and throwing that generalizes to unseen numbers and arrangements of objects without any additional training.