Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

Open in new window