TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

Open in new window