Towards Optimal Learning of Language Models