MR. Video: MapReduce as an Effective Principle for Long Video Understanding
–Neural Information Processing Systems
The fundamental challenge of long video understanding, e.g., question answering, lies in the extensive number of frames, making it infeasible to densely understand the local details while comprehensively digest the global contexts, especially within a limited context length. To address this problem, our insight is to process short video segments individually and combine these segment-level analyses into a final response. This intuition is noted in the well-established MapReduce principle in big data processing and is naturally compatible with inference scaling at the system level. Motivated by this, we propose MR.
Neural Information Processing Systems
Jun-14-2026, 13:56:52 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > Experimental Study (1.00)
- Workflow (0.93)
- Industry:
- Information Technology (0.87)
- Technology: