Goto

Collaborating Authors

 Reinforcement Learning







42c8938e4cf5777700700e642dc2a8cd-AuthorFeedback.pdf

Neural Information Processing Systems

It also makes no assumptions on the sparseness of the transitions. Our experiments reflect this as well,25 as the transition probabilities are drawn from a uniform distribution with no sparseness assumptions and would be26 more difficult tothan sparse cases.



cf5a019ae9c11b4be88213ce3f85d85c-Paper-Conference.pdf

Neural Information Processing Systems

Here, we focus on a more practical setting in object rearrangement,i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objectiveto train aTargetGradientField (TarGF),indicating a direction on each object to increase the likelihood of the target distribution.