Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences?