Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks