OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding