Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Jun-22-2026, 04:45:06 GMT–Neural Information Processing Systems

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard 4 diffusion SR model wrapped in CoZ attains beyond 256 enlargement with high perceptual quality and fidelity.

machine learning, natural language, vlm, (19 more...)

Neural Information Processing Systems

Jun-22-2026, 04:45:06 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.46)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.88)

Industry:
- Health & Medicine > Diagnostic Medicine (0.67)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found