Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents