A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning