2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining