Introduction to Visual Question Answering: Datasets, Approaches and Evaluation - Tryolabs Blog
Historically, building a system that can answer natural language questions about any image has been considered a very ambitious goal. So, how many players are in the image? Well, we can count them and see that there are eleven players, since we are smart enough not to count the referee, right? Although as humans we can normally perform this task without major inconveniences, the development of a system with these capabilities has always seemed closer to science fiction than to the current possibilities of Artificial Intelligence (AI). However, with the advent of Deep Learning (DL), we have witnessed enormous research progress in Visual Question Answering (VQA), in such a way that systems capable of answering these questions are emerging with promising results. In this article I will briefly go through some of the current datasets, approaches and evaluation metrics in VQA, and on how this challenging task can be applied to real life use cases.
Mar-14-2018, 21:15:20 GMT