Stochastic parrot or world model? How large language models learn