Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection