Goto

Collaborating Authors

 Cognitive Science


CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Neural Information Processing Systems

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. We break down the problem into two causes: concept ignorance and concept mismapping. To tackle the two challenges, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with the imageto-text concept matching mechanism. Firstly, we introduce a novel image-totext concept activation module to guide the diffusion model in revisiting ignored concepts. Additionally, an attribute concentration module is proposed to map the text conditions of each entity to its corresponding image area correctly. Extensive experimental evaluations, conducted across three distinct text-to-image alignment benchmarks, demonstrate the superior efficacy of our proposed method, CoMat-SDXL, over the baseline model, SDXL [49]. We also show that our method enhances general condition utilization capability and generalizes to the long and complex prompt despite not specifically training on it. The code is available at https://github.com/CaraJ7/CoMat.


ReFT: Representation Finetuning for Language Models Zhengxuan Wu Zheng Wang Atticus Geiger

Neural Information Processing Systems

Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency. Both are drop-in replacements for existing PEFTs and learn interventions that are 15 -65 more parameter-efficient than LoRA.


WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Neural Information Processing Systems

Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (nonparametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle--reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor generalization). Therefore, we propose WISE to bridge the gap between memories.


Divide-and-Conquer Predictive Coding: a Structured Bayesian Inference Algorithm Eli Sennesh 1, Hao Wu2 Department of Psychology, Vanderbilt University, Nashville, TN, USA

Neural Information Processing Systems

Unexpected stimuli induce "error" or "surprise" signals in the brain. The theory of predictive coding promises to explain these observations in terms of Bayesian inference by suggesting that the cortex implements variational inference in a probabilistic graphical model. However, when applied to machine learning tasks, this family of algorithms has yet to perform on par with other variational approaches in high-dimensional, structured inference problems. To address this, we introduce a novel predictive coding algorithm for structured generative models, that we call divide-and-conquer predictive coding (DCPC); it differs from other formulations of predictive coding, as it respects the correlation structure of the generative model and provably performs maximum-likelihood updates of model parameters, all without sacrificing biological plausibility. Empirically, DCPC achieves better numerical performance than competing algorithms and provides accurate inference in a number of problems not previously addressed with predictive coding. We provide an open implementation of DCPC in Pyro on Github.


A Neuralink Rival Just Tested a Brain Implant in a Person

WIRED

Brain-computer interface startup Paradromics today announced that surgeons successfully inserted the company's brain implant into a patient and safely removed it after about 10 minutes. It's a step toward longer trials of the device, dubbed Connexus. It's also the latest commercial development in a growing field of companies--including Elon Musk's Neuralink--aiming to connect people's brains directly to computers. With the Connexus, Austin-based Paradromics is looking to restore speech and communication in people with spinal cord injury, stroke, or amyotrophic lateral sclerosis, also known as ALS. The device is designed to translate neural signals into synthesized speech, text, and cursor control.


Jan P. Bauer

Neural Information Processing Systems

Exp. Psychology, Oxford ELSC, HebrewU Department of Computing Brain Mind Institute, EPFL Gatsby Unit, UCL Imperial College London Andrew M. Saxe Christopher Summerfield Ali Hummos


Adapting Neural Architectures Between Domains (Supplementary Material) Yanxi Li1

Neural Information Processing Systems

This supplementary material consists of three parts, including the proofs of all lemmas, theorems and corollaries (Section A), details of the experiment setting (Section B) and some additional experiment results (Section C). A.1 Proof of Lemma 1 Lemma 1. [2] Let R be a representation function R: X Z, and D A.2 Proof of Theorem 2 Theorem 2. Let m be the size of ลจ By taking union bound of Eq. 7 over all h H By combining Theorem 2 and Lemma 3, we can derive the proof of Corollary 4. Let ลจ Finally, by applying the bound between the expected domain distance with the empirical domain distance according to [6], we can have Eq. B.1 NAS Search Space Following many previous works [3, 5, 7, 9, 10], we use the NASNet search space [10]. There are 2 kinds of cells in the search space, including normal cells and reduction cells. Normal cells use stride 1 and maintain the size of feature maps.


Multimodal Learning and Reasoning for Visual Question Answering

Neural Information Processing Systems

Reasoning about entities and their relationships from multimodal data is a key goal of Artificial General Intelligence. The visual question answering (VQA) problem is an excellent way to test such reasoning capabilities of an AI model and its multimodal representation learning. However, the current VQA models are oversimplified deep neural networks, comprised of a long short-term memory (LSTM) unit for question comprehension and a convolutional neural network (CNN) for learning single image representation. We argue that the single visual representation contains a limited and general information about the image contents and thus limits the model reasoning capabilities. In this work we introduce a modular neural network model that learns a multimodal and multifaceted representation of the image and the question. The proposed model learns to use the multimodal representation to reason about the image entities and achieves a new state-of-the-art performance on both VQA benchmark datasets, VQA v1.0 and v2.0, by a wide margin.


Neural Embeddings Rank: Aligning 3D latent dynamics with movements

Neural Information Processing Systems

Aligning neural dynamics with movements is a fundamental goal in neuroscience and brain-machine interfaces. However, there is still a lack of dimensionality reduction methods that can effectively align low-dimensional latent dynamics with movements. To address this gap, we propose Neural Embeddings Rank (NER), a technique that embeds neural dynamics into a 3D latent space and contrasts the embeddings based on movement ranks. NER learns to regress continuous representations of neural dynamics (i.e., embeddings) on continuous movements. We apply NER and six other dimensionality reduction techniques to neurons in the primary motor cortex (M1), dorsal premotor cortex (PMd), and primary somatosensory cortex (S1) as monkeys perform reaching tasks.


A Unifying Normative Framework of Decision Confidence

Neural Information Processing Systems

Self-assessment of one's choices, i.e., confidence, is the topic of many decision neuroscience studies. Computational models of confidence, however, are limited to specific scenarios such as between choices with the same value. Here we present a normative framework for modeling decision confidence that is generalizable to various tasks and experimental setups.