video abstract
Preacher: Paper-to-Video Agentic System
Liu, Jingwei, Yang, Ling, Luo, Hao, Wang, Fan, Li, Hongyan, Wang, Mengdi
The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a topdown approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, synthesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully generates high-quality video abstracts across five research fields, demonstrating expertise beyond current video generation models. Code will be released at: https://github.com/Gen-Verse/Paper2Video
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Research Report (1.00)
- Overview (0.93)
Turn-Taking in Commander-Robot Navigator Dialog (Video Abstract)
Cassidy, Taylor (US Army Research Laboratory) | Voss, Clare (US Army Research Laboratory) | Summers-Stay, Douglas (US Army Research Laboratory)
The accompanying video captures the multi-modal data displays and speech dialogue of a human Commander (C) and a human Robot Navigator (RN) tele-operating a mobile robot (R) in a remote, previously unexplored area. We describe unique challenges for automation of turn-taking and coordination processes observed in the data.