LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling

Open in new window