Learning to Plan for Language Modeling from Unlabeled Data