Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering