Goto

Collaborating Authors

 diffuse


Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Neural Information Processing Systems

Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. The code implementations is available at https://github.com/soinx0629/vis


Freeze, Diffuse, Decode: Geometry-Aware Adaptation of Pretrained Transformer Embeddings for Antimicrobial Peptide Design

arXiv.org Artificial Intelligence

Pretrained transformers provide rich, general-purpose embeddings, which are transferred to downstream tasks. However, current transfer strategies: fine-tuning and probing, either distort the pretrained geometric structure of the embeddings or lack sufficient expressivity to capture task-relevant signals. These issues become even more pronounced when supervised data are scarce. Here, we introduce Freeze, Diffuse, Decode (FDD), a novel diffusion-based framework that adapts pre-trained embeddings to downstream tasks while preserving their underlying geometric structure. FDD propagates supervised signal along the intrinsic manifold of frozen embeddings, enabling a geometry-aware adaptation of the embedding space. Applied to antimicrobial peptide design, FDD yields low-dimensional, predictive, and interpretable representations that support property prediction, retrieval, and latent-space interpolation.


I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

arXiv.org Artificial Intelligence

This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs). Existing multimodal diffusion finetuning methods largely focus on pixel-level reconstruction rather than in-context reasoning, and are constrained by the complexity and limited availability of reasoning-based datasets. ThinkDiff addresses these challenges by leveraging vision-language training as a proxy task, aligning VLMs with the decoder of an encoder-decoder large language model (LLM) instead of a diffusion decoder. This proxy task builds on the observation that the $\textbf{LLM decoder}$ shares the same input feature space with $\textbf{diffusion decoders}$ that use the corresponding $\textbf{LLM encoder}$ for prompt embedding. As a result, aligning VLMs with diffusion decoders can be simplified through alignment with the LLM decoder. Without complex training and datasets, ThinkDiff effectively unleashes understanding, reasoning, and composing capabilities in diffusion models. Experiments demonstrate that ThinkDiff significantly improves accuracy from 19.2% to 46.3% on the challenging CoBSAT benchmark for multimodal in-context reasoning generation, with only 5 hours of training on 4 A100 GPUs. Additionally, ThinkDiff demonstrates exceptional performance in composing multiple images and texts into logically coherent images. Project page: https://mizhenxing.github.io/ThinkDiff.


Diffusion-based Virtual Fixtures

arXiv.org Artificial Intelligence

For a long time, robotics considered objects in the environment primarily as obstacles and the goal was to avoid contact due to modeling and sensing difficulties. However, the specifying only the target and obstacle regions, a smooth flow trend has shifted towards embracing contact due to increasing field on the tangent space can guide agents to the closest target interest in manipulation, tactile robotics, and surface inspection while avoiding the restricted zones and maintaining contact tasks. Consequently, robots physically interact with their with the surface, as depicted in Figure 1-c. For addressing surrounding environment that can charecterized by curved these challenges, we propose a surface virtual fixture method surfaces, which can also be soft and fragile (e.g., surgical expecting surfaces as possibly noisy and partial point clouds robotics). However, safety in these tasks remains a major collected in runtime using an off-the-shelf camera attached concern during deployment in real-world as they involve to the robot. Next, we segment the point cloud into a set forceful interactions. Considering that a significant percentage of regions with their specified behavior. This segmentation of recent approaches propose learning-based controllers, and can come from learning-based methods using vision [7] or that the majority of shared control and teleoperation tasks geometry [8]. Alternatively, one can use virtual or real-world depend on the operator's expertise or skills, safety takes a expert annotations [5], [6], possibly in combination with more central role in assistive systems.


Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Neural Information Processing Systems

Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder.


GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis

arXiv.org Artificial Intelligence

Decoupling the illumination in 3D scenes is crucial for novel view synthesis and relighting. In this paper, we propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components, enabling the synthesis of realistic lighting effects. To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework. The fundamental idea is to view the rendering tasks under various lighting positions as a multi-task learning problem, which our meta-learning approach effectively addresses by generalizing the learned Gaussian geometries not only across different viewpoints but also across diverse light positions. Experimental results demonstrate the effectiveness of our approach in terms of training efficiency and rendering quality compared to existing methods for free-viewpoint relighting.


Label-Efficient Model Selection for Text Generation

arXiv.org Artificial Intelligence

Model selection for a given target task can be costly, as it may entail extensive annotation of the quality of outputs of different models. We introduce DiffUse, an efficient method to make an informed decision between candidate text generation models. DiffUse reduces the required amount of preference annotations, thus saving valuable time and resources in performing evaluation. DiffUse intelligently selects instances by clustering embeddings that represent the semantic differences between model outputs. Thus, it is able to identify a subset of examples that are more informative for preference decisions. Our method is model-agnostic, and can be applied to any text generation model. Moreover, we propose a practical iterative approach for dynamically determining how many instances to annotate. In a series of experiments over hundreds of model pairs, we demonstrate that DiffUse can dramatically reduce the required number of annotations -- by up to 75% -- while maintaining high evaluation reliability.


How the leopard got its spots: Age-old question of how animals develop their patterns may have finally been solved - with the aid of British computer pioneer Alan Turing

Daily Mail - Science & tech

From spotty leopards to stripy zebras, nature has no shortage of distinct patterns on animals and plants. Now, the age-old question of how these patterns developed may have finally been solved. Scientists have shown that the same physical process that helps remove dirt from laundry could play a role in how tropical fish get their colourful spots and stripes. For their study, the team at the University of Colorado Boulder drew on the groundbreaking work of British computer pioneer Alan Turing, dating back more than 70 years. They believe their findings could help develop new materials and even new drugs.


AI Is Like โ€ฆ Nuclear Weapons?

The Atlantic - Technology

The concern, as Edward Teller saw it, was quite literally the end of the world. He had run the calculations, and there was a real possibility, he told his Manhattan Project colleagues in 1942, that when they detonated the world's first nuclear bomb, the blast would set off a chain reaction. All life on Earth would be incinerated. Some of Teller's colleagues dismissed the idea, but others didn't. If there were even a slight possibility of atmospheric ignition, said Arthur Compton, the director of a Manhattan Project lab in Chicago, all work on the bomb should halt.


ElC-OIS: Ellipsoidal Clustering for Open-World Instance Segmentation on LiDAR Data

arXiv.org Artificial Intelligence

Open-world Instance Segmentation (OIS) is a challenging task that aims to accurately segment every object instance appearing in the current observation, regardless of whether these instances have been labeled in the training set. This is important for safety-critical applications such as robust autonomous navigation. In this paper, we present a flexible and effective OIS framework for LiDAR point cloud that can accurately segment both known and unknown instances (i.e., seen and unseen instance categories during training). It first identifies points belonging to known classes and removes the background by leveraging close-set panoptic segmentation networks. Then, we propose a novel ellipsoidal clustering method that is more adapted to the characteristic of LiDAR scans and allows precise segmentation of unknown instances. Furthermore, a diffuse searching method is proposed to handle the common over-segmentation problem presented in the known instances. With the combination of these techniques, we are able to achieve accurate segmentation for both known and unknown instances. We evaluated our method on the SemanticKITTI open-world LiDAR instance segmentation dataset. The experimental results suggest that it outperforms current state-of-the-art methods, especially with a 10.0% improvement in association quality. The source code of our method will be publicly available at https://github.com/nubot-nudt/ElC-OIS.