AITopics | slotdiffusion

Collaborating Authors

slotdiffusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Neural Information Processing SystemsDec-26-2025, 11:10:57 GMT

Object-centric learning aims to represent visual data with a set of object entities (a.k.a.

diffusion model, object-centric generative modeling, slotdiffusion, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models Ziyi Wu

Neural Information Processing SystemsNov-19-2025, 12:10:40 GMT

In this paper, we focus on improving slot-to-image decoding, a crucial aspect for high-quality visual generation.

machine learning, natural language, slotdiffusion, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

9fa03b16dbd6cabc7601fe98c6ec291e-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 03:04:43 GMT

machine learning, natural language, slotdiffusion, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Akan, Adil Kaan, Yemez, Yucel

arXiv.org Artificial IntelligenceJan-28-2025

We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion models, while avoiding their text-centric conditioning bias. We also incorporate an additional guidance loss into our architecture to align cross-attention from adapter layers with slot attention. This enhances the alignment of our model with the objects in the input image without using external supervision. Experimental results show that our method outperforms state-of-the-art techniques in object discovery and image generation tasks across multiple datasets, including those with real images. Furthermore, we demonstrate through experiments that our method performs remarkably well on complex real-world images for compositional generation, in contrast to other slot-based generative methods in the literature. The real world is inherently structured with distinct, composable parts and objects that can be combined in various ways; this compositional characteristic is essential for cognitive functions like reasoning, understanding causality, and ability to generalize beyond training data (Lake et al., 2017; Bottou, 2014; Schölkopf et al., 2021; Bahdanau et al., 2019; Fodor & Pylyshyn, 1988). While language clearly reflects this modularity through sentences made up of distinct words and tokens, the compositional structure is less obvious in visual data. Object-centric learning (OCL) offers a promising approach to uncover this latent structure by grouping related features into coherent object representations without supervision (Kahneman et al., 1992; Greff et al., 2020). By decomposing complex scenes into separate objects and their interactions, OCL mimics how humans interpret their environment (Spelke & Kinzler, 2007), potentially improving the robustness and interpretability of AI systems (Lake et al., 2017; Schölkopf et al., 2021). This approach shifts from traditional pixelbased feature extraction to a more meaningful segmentation of visual data, which is key for better generalization and supporting high-level reasoning tasks. Recent advances in OCL have shown the potential to incorporate powerful generative models, such as transformers and diffusion models, into the OCL framework as image decoders. Notably, models such as Latent Slot Diffusion (LSD) (Jiang et al., 2023) and SlotDiffusion (Wu et al., 2023b) have considerably improved performance in object discovery and visual generation tasks in real-world settings by employing slot-conditioned diffusion models.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.15878

Genre: Research Report > Promising Solution (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Neural Information Processing SystemsJan-19-2025, 17:15:16 GMT

Object-centric learning aims to represent visual data with a set of object entities (a.k.a.

artificial intelligence, machine learning, slotdiffusion, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Wu, Ziyi, Hu, Jingyu, Lu, Wuyue, Gilitschenski, Igor, Garg, Animesh

arXiv.org Artificial IntelligenceSep-21-2023

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent approaches have made significant progress in unsupervised object discovery. In addition, slot-based representations hold great potential for generative modeling, such as controllable image generation and object manipulation in image editing. However, current slot-based methods often produce blurry images and distorted objects, exhibiting poor generative modeling capabilities. In this paper, we focus on improving slot-to-image decoding, a crucial aspect for high-quality visual generation. We introduce SlotDiffusion -- an object-centric Latent Diffusion Model (LDM) designed for both image and video data. Thanks to the powerful modeling capacity of LDMs, SlotDiffusion surpasses previous slot models in unsupervised object segmentation and visual generation across six datasets. Furthermore, our learned object features can be utilized by existing object-centric dynamics models, improving video prediction quality and downstream temporal reasoning tasks. Finally, we demonstrate the scalability of SlotDiffusion to unconstrained real-world datasets such as PASCAL VOC and COCO, when integrated with self-supervised pre-trained image encoders.

dataset, decoder, slotdiffusion, (14 more...)

arXiv.org Artificial Intelligence

2305.11281

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.63)

Industry:

Media > Photography (0.48)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback