Transformers for Tabular Data: A Training Perspective of Self-Attention via Optimal Transport