Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
–Neural Information Processing Systems
Understanding long video content is a complex endeavor that often relies on densely sampled frame captions or end-to-end feature selectors, yet these techniques commonly overlook the logical relationships between textual queries and visual elements.
Neural Information Processing Systems
Jun-13-2026, 23:01:17 GMT
- Technology: