Measuring Machine Intelligence Through Visual Question Answering

Zitnick, C. Lawrence (Facebook AI Research) | Agrawal, Aishwarya (Virginia Institute of Technology) | Antol, Stanislaw (Virginia Institute of Technology) | Mitchell, Margaret (Microsoft Research) | Batra, Dhruv (Virginia Institute of Technology) | Parikh, Devi (Virginia Institute of Technology)

AI Magazine 

We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine's ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.