SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution Qi Tang 1,2
–Neural Information Processing Systems
Diffusion-based Video Super-Resolution (VSR) is renowned for generating perceptually realistic videos, yet it grapples with maintaining detail consistency across frames due to stochastic fluctuations. The traditional approach of pixel-level alignment is ineffective for diffusion-processed frames because of iterative disruptions. To overcome this, we introduce SeeClear-a novel VSR framework leveraging conditional video generation, orchestrated by instance-centric and channel-wise semantic controls. This framework integrates a Semantic Distiller and a Pixel Condenser, which synergize to extract and upscale semantic details from low-resolution frames. The Instance-Centric Alignment Module (InCAM) utilizes video-clip-wise tokens to dynamically relate pixels within and across frames, enhancing coherency.
Neural Information Processing Systems
Mar-27-2025, 14:58:51 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.67)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Communications (0.68)
- Data Science (0.68)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology