Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances