Training LLMs Beyond Next Token Prediction -- Filling the Mutual Information Gap