Learning Theory of Transformers: Local-to-Global Approximation via Softmax Partition of Unity