Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Jun-13-2026, 01:17:42 GMT–Neural Information Processing Systems

Long-form video understanding presents significant challenges due to extensive temporal-spatial complexity and the difficulty of question answering under such extended contexts. While Large Language Models (LLMs) have demonstrated considerable advancements in video analysis capabilities and long context handling, they continue to exhibit limitations when processing information-dense hour-long videos.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Jun-13-2026, 01:17:42 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)