Neural Sequence Model Training via $\alpha$-divergence Minimization