cf5a019ae9c11b4be88213ce3f85d85c-Paper-Conference.pdf

Neural Information Processing Systems 

Here, we focus on a more practical setting in object rearrangement,i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objectiveto train aTargetGradientField (TarGF),indicating a direction on each object to increase the likelihood of the target distribution.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found