Learning to Act with Affordance-Aware Multimodal Neural SLAM
We focus on the ALFRED challenge Shridhar et al. (2020), where an agent is asked to follow human instructions to complete long-horizon household tasks in indoor scenes (simulated in AI2Thor Kolve et al. (2017)). Each task in ALFRED consists of several subgoals for either navigation (moving in the environment) or object interactions (interacting with at least one object). Language inputs contain a high-level task description and a sequence of low-level step-by-step instructions (each corresponding to a subgoal). The agent is a simulated robot with access to the states of the environment only through a front-view RGB camera with a relatively small field of view. The agent's own state is a 5-tuple (x,y,r,h,o), where x,y are its 2D position, r the horizontal rotation angle, h the vertical camera angles (also called "horizon") and o the type of object held in its hand.
Jan-28-2022, 20:43:47 GMT
- Technology: