ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding
–arXiv.org Artificial Intelligence
In this paper, we propose a novel video understanding framework, called ChronoForge-RL, which combines Temporal Apex Distillation (T AD) and KeyFrame-aware Group Relative Policy Optimization (KF-GRPO) to tackle these issues. Concretely, we introduce a differentiable keyframe selection mechanism that systematically identifies semantic inflection points through a three-stage process to enhance computational efficiency while preserving temporal information. Then, two particular modules are proposed to enable effective temporal reasoning: Firstly, T AD leverages variation scoring, inflection detection, and prioritized distillation to select the most informative frames. Secondly, we introduce KF-GRPO which implements a contrastive learning paradigm with a saliency-enhanced reward mechanism that explicitly incentivizes models to leverage both frame content and temporal relationships. Finally, our proposed ChronoForge-RL achieves 69.1% on VideoMME and 52.7% on L VBench compared to baseline methods, clearly surpassing previous approaches while enabling our 7B parameter model to achieve performance comparable to 72B parameter alternatives, a 10 improvement in performance-to-parameter ratio.
arXiv.org Artificial Intelligence
Sep-22-2025