JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention