Goto

Collaborating Authors

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation Zhuoyan Luo

Neural Information Processing Systems

This paper studies referring video object segmentation (RVOS) by boosting videolevel visual-linguistic alignment. Recent approaches model the RVOS task as a sequence prediction problem and perform multi-modal interaction as well as segmentation for each frame separately. However, the lack of a global view of video content leads to difficulties in effectively utilizing inter-frame relationships and understanding textual descriptions of object temporal variations. To address this issue, we propose Semantic-assisted Object Cluster (SOC), which aggregates video content and textual guidance for unified temporal modeling and cross-modal alignment. By associating a group of frame-level object embeddings with language tokens, SOC facilitates joint space learning across modalities and time steps. Moreover, we present multi-modal contrastive supervision to help construct wellaligned joint space at the video level. We conduct extensive experiments on popular RVOS benchmarks, and our method outperforms state-of-the-art competitors on all benchmarks by a remarkable margin. Besides, the emphasis on temporal coherence enhances the segmentation stability and adaptability of our method in processing text expressions with temporal variations.


Interpreting Learned Feedback Patterns in Large Language Models Luke Marks Amir Abdullah Clement Neo

Neural Information Processing Systems

Reinforcement learning from human feedback (RLHF) is widely used to train large language models (LLMs). However, it is unclear whether LLMs accurately learn the underlying preferences in human feedback data. We coin the term Learned Feedback Pattern (LFP) for patterns in an LLM's activations learned during RLHF that improve its performance on the fine-tuning task. We hypothesize that LLMs with LFPs accurately aligned to the fine-tuning feedback exhibit consistent activation patterns for outputs that would have received similar feedback during RLHF. To test this, we train probes to estimate the feedback signal implicit in the activations of a fine-tuned LLM. We then compare these estimates to the true feedback, measuring how accurate the LFPs are to the fine-tuning feedback. Our probes are trained on a condensed, sparse and interpretable representation of LLM activations, making it easier to correlate features of the input with our probe's predictions. We validate our probes by comparing the neural features they correlate with positive feedback inputs against the features GPT-4 describes and classifies as related to LFPs. Understanding LFPs can help minimize discrepancies between LLM behavior and training objectives, which is essential for the safety and alignment of LLMs.



A Geometric Analysis of Neural Collapse with Unconstrained Features

Neural Information Processing Systems

We provide the first global optimization landscape analysis of Neural Collapse-- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported in [1], this phenomenon implies that (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. We study the problem based on a simplified unconstrained feature model, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. Our analysis of the simplified model not only explains what kind of features are learned in the last layer, but also shows why they can be efficiently optimized, matching the empirical observations in practical deep network architectures. These findings provide important practical implications. As an example, our experiments demonstrate that one may set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, which reduces memory cost by over 20% on ResNet18 without sacrificing the generalization performance.


Brant: Foundation Model for Intracranial Neural Signal

Neural Information Processing Systems

We propose a foundation model named Brant for modeling intracranial recordings, which learns powerful representations of intracranial neural signals by pre-training, providing a large-scale, off-the-shelf model for medicine. Brant is the largest model in the field of brain signals and is pre-trained on a large corpus of intracranial data collected by us. The design of Brant is to capture long-term temporal dependency and spatial correlation from neural signals, combining the information in both time and frequency domains. As a foundation model, Brant achieves SOTA performance on various downstream tasks (i.e.


Breaking encryption with a quantum computer just got 20 times easier

New Scientist

Quantum computers could crack a common data encryption technique once they have a million qubits, or quantum bits. While this is still well beyond the capabilities of existing quantum computers, this new estimate is 20 times lower than previously thought, suggesting the day encryption is cracked is closer than we think.


A Unified Discretization Framework for Differential Equation Approach with Lyapunov Arguments for Convex Optimization

Neural Information Processing Systems

The differential equation (DE) approach for convex optimization, which relates optimization methods to specific continuous DEs with rate-revealing Lyapunov functionals, has gained increasing interest since the seminal paper by Su-Boyd-Candรจs (2014). However, the approach still lacks a crucial component to make it truly useful: there is no general, consistent way to transition back to discrete optimization methods. Consequently, even if we derive insights from continuous DEs, we still need to perform individualized and tedious calculations for the analysis of each method. This paper aims to bridge this gap by introducing a new concept called "weak discrete gradient" (wDG), which consolidates the conditions required for discrete versions of gradients in the DE approach arguments. We then define abstract optimization methods using wDG and provide abstract convergence theories that parallel those in continuous DEs. We demonstrate that many typical optimization methods and their convergence rates can be derived as special cases of this abstract theory. The proposed unified discretization framework for the differential equation approach to convex optimization provides an easy environment for developing new optimization methods and achieving competitive convergence rates with state-of-the-art methods, such as Nesterov's accelerated gradient.


Neural Architecture Dilation for Adversarial Robustness (Supplementary Material) Yanxi Li1

Neural Information Processing Systems

For the dilation architecture, we use a DAG with 4 nodes as the supernetwork. There are 8 operation candidates for each edges, including 4 convolutional operations: 3 3 separable convolutions, 5 5 separable convolutions, 3 3 dilated separable convolutions and 5 5 dilated separable convolutions, 2 pooling operations: 3 3 average pooling and 3 3 max pooling, and two special operations: an identity operation representing skip-connection and a zero operation representing two nodes are not connected. During dilating, we stack 3 cells for each of the 3 blocks in the WRN34-10. During retraining, the number is increased to 6. The dilated architectures designed by NADAR are as shown in Figure 1.


Why the argument for WFH could get a big boost from AI

ZDNet

The pandemic changed how people worked, shifting most professionals to remote or hybrid models. For the software company Atlassian, this flexible, distributed approach persists to this day. "We have 13,000 employees spread across the globe, and individuals can choose their working location every day," said Annie Dean, Head of Team Anywhere, Atlassian's distributed work policy. "It's about how we work, not where we work." The implementation of the flexible model has produced positive effects for employees and the company alike. Internal data reveals that even though only 34% of employees have opted to work from home, 92% of Atlassian employees reported that the ability to work from anywhere allows them to perform their best, and 91% said it's an important reason for staying at the company.


Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation

Neural Information Processing Systems

The simplest and most widely applied method for guaranteeing differential privacy is to add instance-independent noise to a statistic of interest that is scaled to its global sensitivity. However, global sensitivity is a worst-case notion that is often too conservative for realized dataset instances. We provide methods for scaling noise in an instance-dependent way and demonstrate that they provide greater accuracy under average-case distributional assumptions. Specifically, we consider the basic problem of privately estimating the mean of a real distribution from i.i.d.