How Does Controllability Emerge In Language Models During Pretraining?