AITopics | Ostashev, Daniil

Collaborating Authors

Ostashev, Daniil

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Object-level Visual Prompts for Compositional Image Generation

Parmar, Gaurav, Patashnik, Or, Wang, Kuan-Chieh, Ostashev, Daniil, Narasimhan, Srinivasa, Zhu, Jun-Yan, Cohen-Or, Daniel, Aberman, Kfir

arXiv.org Artificial IntelligenceJan-2-2025

We introduce a method for composing object-level visual prompts within a text-to-image diffusion model. Our approach addresses the task of generating semantically coherent compositions across diverse scenes and styles, similar to the versatility and expressiveness offered by text prompts. A key challenge in this task is to preserve the identity of the objects depicted in the input visual prompts, while also generating diverse compositions across different images. To address this challenge, we introduce a new KV-mixed cross-attention mechanism, in which keys and values are learned from distinct visual representations. The keys are derived from an encoder with a small bottleneck for layout control, whereas the values come from a larger bottleneck encoder that captures fine-grained appearance details. By mixing keys and values from these complementary sources, our model preserves the identity of the visual prompts while supporting flexible variations in object arrangement, pose, and composition. During inference, we further propose object-level compositional guidance to improve the method's identity preservation and layout correctness. Results show that our technique produces diverse scene compositions that preserve the unique characteristics of each visual prompt, expanding the creative potential of text-to-image generation.

artificial intelligence, machine learning, visual prompt, (17 more...)

arXiv.org Artificial Intelligence

2501.01424

Country:

Asia > Middle East (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Nested Attention: Semantic-aware Attention Values for Concept Personalization

Patashnik, Or, Gal, Rinon, Ostashev, Daniil, Tulyakov, Sergey, Aberman, Kfir, Cohen-Or, Daniel

arXiv.org Artificial IntelligenceJan-2-2025

Personalizing text-to-image models to generate images of specific subjects across diverse scenes and styles is a rapidly advancing field. Current approaches often face challenges in maintaining a balance between identity preservation and alignment with the input text prompt. Some methods rely on a single textual token to represent a subject, which limits expressiveness, while others employ richer representations but disrupt the model's prior, diminishing prompt alignment. In this work, we introduce Nested Attention, a novel mechanism that injects a rich and expressive image representation into the model's existing cross-attention layers. Our key idea is to generate query-dependent subject values, derived from nested attention layers that learn to select relevant subject features for each region in the generated image. We integrate these nested layers into an encoder-based personalization method, and show that they enable high identity preservation while adhering to input text prompts. Our approach is general and can be trained on various domains. Additionally, its prior preservation allows us to combine multiple personalized subjects from different domains in a single image.

input image, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2501.01407

Country:

Asia > Middle East (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Wang, Kuan-Chieh, Ostashev, Daniil, Fang, Yuwei, Tulyakov, Sergey, Aberman, Kfir

arXiv.org Artificial IntelligenceMay-6-2024

We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model's prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch. A novel routing mechanism manages the distribution of pixels in each layer across these branches to optimize the blend of personalized and generic content creation. Once trained, MoA facilitates the creation of high-quality, personalized images featuring multiple subjects with compositions and interactions as diverse as those generated by the original model. Crucially, MoA enhances the distinction between the model's pre-existing capability and the newly augmented personalized intervention, thereby offering a more disentangled subject-context control that was previously unattainable. Project page: https://snap-research.github.io/mixture-of-attention

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.11565

Country: Asia > Middle East (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback