Evaluating Multimodal Large Language Models on Educational Textbook Question Answering