Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models