AITopics | diffusion feature

Collaborating Authors

diffusion feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

Neural Information Processing SystemsMar-18-2026, 23:42:47 GMT

Diffusion models are powerful generative models, and this capability can also be applied to discrimination. The inner activations of a pre-trained diffusion model can serve as features for discriminative tasks, namely, diffusion feature. We discover that diffusion feature has been hindered by a hidden yet universal phenomenon that we call content shift. To be specific, there are content differences between features and the input image, such as the exact shape of a certain object. We locate the cause of content shift as one inherent characteristic of diffusion models, which suggests the broad existence of this phenomenon in diffusion feature. Further empirical study also indicates that its negative impact is not negligible even when content shift is not visually perceivable.

artificial intelligence, diffusion feature, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.85)

Add feedback

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

Neural Information Processing SystemsFeb-17-2026, 11:22:26 GMT

Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video in a zero-shot manner.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Video Diffusion Models are Training-free Motion Interpreter and Controller

Neural Information Processing SystemsFeb-16-2026, 12:08:40 GMT

Leveraging MOFT, we propose a novel training-free video motion control framework.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

SuppressContentShift: BetterDiffusionFeatures viaOff-the-ShelfGenerationTechniques

Neural Information Processing SystemsFeb-9-2026, 09:35:40 GMT

Wediscover that diffusion feature has been hindered by a hidden yet universal phenomenon that we call content shift.

artificial intelligence, content shift, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Video Diffusion Models are Training-free Motion Interpreter and Controller

Neural Information Processing SystemsOct-10-2025, 08:53:04 GMT

Leveraging MOFT, we propose a novel training-free video motion control framework.

arxiv preprint arxiv, diffusion model, information, (15 more...)

Neural Information Processing Systems

Country: Asia (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

Neural Information Processing SystemsOct-9-2025, 20:47:22 GMT

Despite the simplicity, the proposed approach has achieved superior results on various tasks and datasets, validating its potential as a generic booster for diffusion features.

content shift, diffusion feature, diffusion model, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning

Zheng, Chengyu, Huang, Jin, Chen, Honghua, Wei, Mingqiang

arXiv.org Artificial IntelligenceJul-29-2025

Recent research leveraging large-scale pretrained diffusion models has demonstrated the potential of using diffusion features to establish semantic correspondences in images. Inspired by advancements in diffusion-based techniques, we propose a novel zero-shot method for refining point cloud registration algorithms. Our approach leverages correspondences derived from depth images to enhance point feature representations, eliminating the need for a dedicated training dataset. Specifically, we first project the point cloud into depth maps from multiple perspectives and extract implicit knowledge from a pretrained diffusion network as depth diffusion features. These features are then integrated with geometric features obtained from existing methods to establish more accurate correspondences between point clouds. By leveraging these refined correspondences, our approach achieves significantly improved registration accuracy. Extensive experiments demonstrate that our method not only enhances the performance of existing point cloud registration techniques but also exhibits robust generalization capabilities across diverse datasets. Codes are available at https://github.com/zhengcy-lambo/RARE.git.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2507.1995

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor

Agarwal, Vatsal, Gwilliam, Matthew, Kohavi, Gefen, Verma, Eshan, Ulbricht, Daniel, Shrivastava, Abhinav

arXiv.org Artificial IntelligenceJul-10-2025

Recent advances in multimodal large language models (MLLMs) have enabled image-based question-answering capabilities. However, a key limitation is the use of CLIP as the visual encoder; while it can capture coarse global information, it often can miss fine-grained details that are relevant to the input query. To address these shortcomings, this work studies whether pre-trained text-to-image diffusion models can serve as instruction-aware visual encoders. Through an analysis of their internal representations, we find diffusion features are both rich in semantics and can encode strong image-text alignment. Moreover, we find that we can leverage text conditioning to focus the model on regions relevant to the input question. We then investigate how to align these features with large language models and uncover a leakage phenomenon, where the LLM can inadvertently recover information from the original diffusion prompt. We analyze the causes of this leakage and propose a mitigation strategy. Based on these insights, we explore a simple fusion strategy that utilizes both CLIP and conditional diffusion features. We evaluate our approach on both general VQA and specialized MLLM benchmarks, demonstrating the promise of diffusion models for visual understanding, particularly in vision-centric tasks that require spatial and compositional reasoning. Our project page can be found https://vatsalag99.github.io/mustafar/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.07106

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval

Neural Information Processing SystemsMay-27-2025, 20:11:07 GMT

Personalized retrieval and segmentation aim to locate specific instances within a dataset based on an input image and a short description of the reference instance. While supervised methods are effective, they require extensive labeled data for training. Recently, self-supervised foundation models have been introduced to these tasks showing comparable results to supervised methods. However, a significant flaw in these models is evident: they struggle to locate a desired instance when other instances within the same class are presented. In this paper, we explore text-to-image diffusion models for these tasks. Specifically, we propose a novel approach called PDM for Personalized Diffusion Features Matching, that leverages intermediate features of pre-trained text-to-image models for personalization tasks without any additional training.

artificial intelligence, machine learning, personalized segmentation and retrieval, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

diffusion feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

Video Diffusion Models are Training-free Motion Interpreter and Controller

SuppressContentShift: BetterDiffusionFeatures viaOff-the-ShelfGenerationTechniques

aefb164890f80762128efc135205a308-Paper-Conference.pdf

Video Diffusion Models are Training-free Motion Interpreter and Controller

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning

Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval