Local to Global: Learning Dynamics and Effect of Initialization for Transformers