An Active Information Seeking Model for Goal-oriented Vision-and-Language Tasks

Open in new window