video summarization
- Law (0.93)
- Information Technology (0.69)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (2 more...)
- North America > United States > California > Alameda County > Berkeley (0.40)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Context-Aware Pseudo-Label Scoring for Zero-Shot Video Summarization
Wu, Yuanli, Zhang, Long, Du, Yue, Li, Bin
We propose a rubric-guided, pseudo-labeled, and prompt-driven zero-shot video summarization framework that bridges large language models with structured semantic reasoning. A small subset of human annotations is converted into high-confidence pseudo labels and organized into dataset-adaptive rubrics defining clear evaluation dimensions such as thematic relevance, action detail, and narrative progression. During inference, boundary scenes, including the opening and closing segments, are scored independently based on their own descriptions, while intermediate scenes incorporate concise summaries of adjacent segments to assess narrative continuity and redundancy. This design enables the language model to balance local salience with global coherence without any parameter tuning. Across three benchmarks, the proposed method achieves stable and competitive results, with F1 scores of 57.58 on SumMe, 63.05 on TVSum, and 53.79 on QFVS, surpassing zero-shot baselines by +0.85, +0.84, and +0.37, respectively. These outcomes demonstrate that rubric-guided pseudo labeling combined with contextual prompting effectively stabilizes LLM-based scoring and establishes a general, interpretable, and training-free paradigm for both generic and query-focused video summarization.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Leisure & Entertainment (0.67)
- Education (0.46)
SummDiff: Generative Modeling of Video Summarization with Diffusion
Kim, Kwanseok, Hahm, Jaehoon, Kim, Sumin, Sul, Jinhwan, Kim, Byunghak, Lee, Joonseok
Video summarization is a task of shortening a video by choosing a subset of frames while preserving its essential moments. Despite the innate subjectivity of the task, previous works have deterministically regressed to an averaged frame score over multiple raters, ignoring the inherent subjectivity of what constitutes a "good" summary. W e propose a novel problem formulation by framing video summarization as a conditional generation task, allowing a model to learn the distribution of good summaries and to generate multiple plausible summaries that better reflect varying human perspectives. Adopting diffusion models for the first time in video summarization, our proposed method, Sum-mDiff, dynamically adapts to visual contexts and generates multiple candidate summaries conditioned on the input video. Extensive experiments demonstrate that SummDiff not only achieves the state-of-the-art performance on various benchmarks but also produces summaries that closely align with individual annotator preferences. Moreover, we provide a deeper insight with novel metrics from an analysis of the knapsack, which is an important last step of generating summaries but has been overlooked in evaluation.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Illinois (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)