Goto

Collaborating Authors

 Technology


Concept-Guided Interpretability via Neural Chunking

Neural Information Processing Systems

Neural networks are often described as black boxes, reflecting the significant challenge of understanding their internal workings and interactions. We propose a different perspective that challenges the prevailing view: rather than being inscrutable, neural networks exhibit patterns in their raw population activity that mirror regularities in the training data. We refer to this as the Reflection Hypothesis and provide evidence for this phenomenon in both simple recurrent neural networks (RNNs) and complex large language models (LLMs). Building on this insight, we propose to leverage our cognitive tendency of chunking to segment high-dimensional neural population dynamics into interpretable units that reflect underlying concepts. We propose three methods to extract recurring chunks on a neural population level, complementing each other based on label availability and neural data dimensionality.


WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Neural Information Processing Systems

Can we build accurate world models out of large language models (LLMs)? How can world models benefit LLM agents? The gap between the prior knowledge of LLMs and the specified environment's dynamics usually bottlenecks LLMs' performance as world models. To bridge the gap, we propose a training-free world alignment that learns an environment's symbolic knowledge complementary to LLMs. The symbolic knowledge covers action rules, knowledge graphs, and scene graphs, which are extracted by LLMs from exploration trajectories and encoded into executable codes to regulate LLM agents' policies.


System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

Neural Information Processing Systems

Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5


Multi-View Oriented GPLVM: Expressiveness and Efficiency

Neural Information Processing Systems

The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral density and the kernel function. By modeling the spectral density with a bivariate Gaussian mixture, we then derive a generic and expressive kernel termed Next-Gen Spectral Mixture (NG-SM) for MV-GPLVMs. To address the inherent computational inefficiency of the NG-SM kernel, we design a new form of random Fourier feature approximation. Combined with a tailored reparameterization trick, this approximation enables scalable variational inference for both the model and the unified latent representations. Numerical evaluations across a diverse range of multi-view datasets demonstrate that our proposed method consistently outperforms state-of-the-art models in learning meaningful latent representations.


The Complexity of Correlated Equilibria in Generalized Games

Neural Information Processing Systems

Correlated equilibria--and their generalizations known as Φ-equilibria--are a fundamental object of study in game theory, offering a more tractable alternative to Nash equilibria in multi-player settings. While computational aspects of equilibrium computation are well-understood in some settings, fundamental questions are still open in generalized games, that is, games in which the set of strategies allowed to each player depends on the other players' strategies. These classes of games model fundamental settings in economics, and have been a cornerstone of economics research since the seminal paper of Arrow and Debreu [1954]. Recently, there has been growing interest, both in economics and in computer science, in studying correlated equilibria in generalized games. It is known that finding a social welfare maximizing correlated equilibrium in generalized games is NP-hard. However, the existence of efficient algorithms to find any equilibrium remains an important open question.


EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Neural Information Processing Systems

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and thirdperson perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery, EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.


Energy Independence is Becoming Solar's Strongest Selling Point

TIME - Tech

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?


ASingle-Swap Local Search Algorithm for k-means of Lines

Neural Information Processing Systems

Clustering is a fundamental problem that has been extensively studied over past few decades, with most research focusing on point-based clustering such as kmeans, k-median, and k-center. However, numerous real-world applications, such as motion analysis, computer vision, and missing data analysis, require clustering over structured data, including lines, time series and affine subspaces (flats), where traditional point-based clustering algorithms often fall short. In this paper, we study the k-means of lines problem, where the input is a set L of lines in Rd, and the goal is to find k centers C in Rd such that the sum of squared distances from each line in L to its nearest center in C is minimized. The local search algorithm is a well-established strategy for point-based k-means clustering, known for its efficiency and provable approximation guarantees. However, extending local search algorithm to the k-means of lines problem is nontrivial, as the capture relation used in point-based clustering does not generalize to the line setting.


Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Neural Information Processing Systems

Adversarial examples crafted on one model often exhibit poor transferability to others, hindering their effectiveness in black-box settings. This limitation arises from two key factors: (i) decision-boundary variation across models and (ii) representation drift in feature space. We address these challenges through a new perspective that frames transferability for untargeted attacks as a consensus-robust optimization problem: adversarial perturbations should remain effective across a neighborhood of plausible target models. To model this uncertainty, we introduce two complementary perturbation channels: a parameter channel, capturing boundary shifts via weight perturbations, and a representation channel, addressing feature drift via stochastic blending of clean and adversarial activations. We then propose CORTA (COnsensus-Robust Transfer Attack), a lightweight attack instantiated from this robust formulation using two first-order strategies: (i) sensitivity regularization based on the squared Frobenius norm of logits' Jacobian with respect to weights, and (ii) Monte Carlo sampling for blended feature representations. Our theoretical analysis provides a certified lower bound linking these approximations to the robust objective. Extensive experiments on CIFAR-100 and ImageNet show that CORTA significantly outperforms state-of-the-art transfer-based methods-- including ensemble approaches--across CNN and Vision Transformer targets. Notably, CORTA achieves a 19.1 percentage-point gain in transfer success rate over the best prior method while using only a single surrogate model.


APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Neural Information Processing Systems

Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated humanagent interplay. We train a family of models--the xLAM-2-fc-rseries with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on τ-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields highquality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-rmodels to advance research in AI agents.