VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection

Wang, Qiang, Gao, Xinyuan, Dong, SongLin, Han, Jizhou, Li, Jiangyang, He, Yuhang, Gong, Yihong

arXiv.org Artificial Intelligence 

W e present VDC-Agent, a self-evolving framework for Video Detailed Captioning that requires neither human annotations nor larger teacher models. The agent forms a closed loop of caption generation, principle-guided scoring (score and textual suggestions), and prompt refinement. When caption quality regresses, a self-reflection path leverages the previous chain-of-thought to amend the update. Running this process on unlabeled videos produces trajectories of (caption, score) pairs. W e convert the trajectories into preference tuples and filter out samples with JSON parsing errors, resulting in VDC-Agent-19K, which contains 18,886 automatically constructed pairs.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found