VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
–Neural Information Processing Systems
A well-known dilemma in large vision-language models ( e.g., GPT -4, LLaV A) is that while increasing the number of vision tokens generally enhances visual
Neural Information Processing Systems
Oct-10-2025, 16:14:09 GMT
- Country:
- Asia
- North America > United States (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.68)
- Information Technology (0.46)
- Technology: