Convergence and Margin of Adversarial Training on Separable Data