Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

Dec-26-2025, 16:04:32 GMT–Neural Information Processing Systems

Disentangled representation learning strives to extract the intrinsic factors within the observed data. Factoring these representations in an unsupervised manner is notably challenging and usually requires tailored loss functions or specific structural designs. In this paper, we introduce a new perspective and framework, demonstrating that diffusion models with cross-attention itself can serve as a powerful inductive bias to facilitate the learning of disentangled representations. We propose to encode an image into a set of concept tokens and treat them as the condition of the latent diffusion model for image reconstruction, where cross attention over the concept tokens is used to bridge the encoder and the U-Net of the diffusion model. We analyze that the diffusion process inherently possesses the time-varying information bottlenecks.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Dec-26-2025, 16:04:32 GMT

Conferences Web Page

Add feedback

Country:
- Asia > China > Guangxi Province > Nanning (0.07)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)