Vocabulary embeddings organize linguistic structure early in language model training

Open in new window