SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM

Neural Information Processing Systems 

However, current Vid-LLMs struggle to simultaneously retain high-quality frame-level semantic information ( i.e., a sufficient

Similar Docs  Excel Report  more

TitleSimilaritySource
None found