How to Merge Your Multimodal Models Over Time?