Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models