VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Oct-10-2025, 16:14:09 GMT–Neural Information Processing Systems

A well-known dilemma in large vision-language models ( e.g., GPT -4, LLaV A) is that while increasing the number of vision tokens generally enhances visual

benchmark, language model, vision token, (15 more...)

Neural Information Processing Systems

Oct-10-2025, 16:14:09 GMT

Conferences PDF

Country:
- North America > United States (0.04)
- Asia
  - China (0.04)
  - Singapore (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Education (0.68)
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
c6a79e139ec4f371701ea8cc9e06018e-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found