Learning Visual Question Answering by Bootstrapping Hard Attention