How GPT learns layer by layer