Disentangling Feature Structure: A Mathematically Provable Two-Stage Training Dynamics in Transformers