DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Jun-22-2026, 01:06:54 GMT–Neural Information Processing Systems

Diffusion Transformer (DiT), a promising diffusion model for visual generation, demonstrates impressive performance but incurs significant computational overhead. Intriguingly, analysis of pre-trained DiT models reveals that global selfattention is often redundant, predominantly capturing local patterns--highlighting the potential for more efficient alternatives. In this paper, we revisit convolution as an alternative building block for constructing efficient and expressive diffusion models. However, naively replacing self-attention with convolution typically results in degraded performance. Our investigations attribute this performance gap to the higher channel redundancy in ConvNets compared to Transformers. To resolve this, we introduce a compact channel attention mechanism that promotes the activation of more diverse channels, thereby enhancing feature diversity.

diffusion model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-22-2026, 01:06:54 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found