Goto

Collaborating Authors

 Technology


ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

Neural Information Processing Systems

Retrieval-Augmented Generation (RAG) enhances Large Language Models by grounding their outputs in external documents. These systems, however, remain vulnerable to attacks on the retrieval corpus, such as prompt injection. RAG-based search systems (e.g., Google's Search AIOverview) present an interesting setting for studying and protecting against such threats, as defense algorithms can benefit from built-in reliability signals--like document ranking--and represent a non-LLM challenge for the adversary due to decades of work to thwart SEO. Motivated by, but not limited to, this scenario, this work introduces ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents. Our first contribution adopts a graph-theoretic perspective to identify a "consistent majority" among retrieved documents to filter out malicious ones. We introduce a novel algorithm based on finding a Maximum Independent Set (MIS) on a document graph where edges encode contradiction. Our MIS variant explicitly prioritizes higher-reliability documents and provides provable robustness guarantees against bounded adversarial corruption under natural assumptions. Recognizing the computational cost of exact MIS for large retrieval sets, our second contribution is a scalable weighted sample and aggregate framework.


Automatic Visual Instrumental Variable Learning for Confounding-Resistant Domain Generalization

Neural Information Processing Systems

Many confounding-resistant domain generalization methods for image classification have been developed based on causal interventions. However, their reliance on strong assumptions limits their effectiveness in handling unobserved confounders. Although recent work introduces instrumental variables (IVs) to overcome this limitation, the reliance on manually predefined instruments, particularly in the context of visual data, may result in severe bias or invalidity when IV conditions are violated. To address these issues, we propose a novel approach to automatically learning Visual Instrumental Variables for confounding-resistant Domain Generalization (VIV-DG). We observe that certain non-causal visual attributes in image data naturally satisfy the basic conditions required for valid IVs. Motivated by this insight, we propose the visual instrumental variable, a novel concept that extends classical IV theory to the visual domain. Furthermore, we develop an automatic visual instrumental variable learner that enforces IV conditions on learned representations, enabling the automatic learning of valid visual instrumental variables from image data. Ultimately, VIV-DG inherits the strengths of classical IVs to mitigate unobserved confounding and avoids the significant bias caused by violations of IV conditions in predefined IVs. Extensive experiments on multiple benchmarks verify that VIV-DG achieves superior generalization ability.



41128e5b3a7622da5b17588757599077-Paper-Conference.pdf

Neural Information Processing Systems

In this work, we introduce ChatVLA-2, a novel mixture-ofexpert VLA model coupled with a specialized two-stage training pipeline designed to preserve the VLM's original strengths while enabling actionable reasoning. To validate our approach, we design a math-matching task wherein a robot interprets math problems written on a whiteboard and picks corresponding number cards from a table to solve equations. Remarkably, our method exhibits exceptional mathematical reasoning and OCR capabilities, despite these abilities not being explicitly trained within the VLA. Furthermore, we demonstrate that the VLA possesses strong spatial reasoning skills, enabling it to interpret novel directional instructions involving previously unseen objects. Overall, our method showcases reasoning and comprehension abilities that significantly surpass state-of-the-art imitation learning methods such as OpenVLA, DexVLA, and ฯ€0. This work represents a substantial advancement toward developing truly generalizable robotic foundation models endowed with robust reasoning capacities.


Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM

Neural Information Processing Systems

Training Long-Context Large Language Models (LLMs) is challenging, as hybrid training with long-context and short-context data often leads to workload imbalances. Existing works mainly use data packing to alleviate this issue, but fail to consider imbalanced attention computation and wasted communication overhead. This paper proposes Hierarchical Balance Packing (HBP), which designs a novel batch-construction method and training recipe to address those inefficiencies.


SegMASt3R: Geometry Grounded Segment Matching

Neural Information Processing Systems

Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features, segment matching captures structured regions, offering greater robustness to occlusions, lighting variations, and viewpoint changes. In this paper, we leverage the spatial understanding of 3D foundation models to tackle wide-baseline segment matching, a challenging setting involving extreme viewpoint shifts. We propose an architecture that uses the inductive bias of these 3D foundation models to match segments across image pairs with up to 180 rotation. Extensive experiments show that our approach outperforms state-of-the-art methods, including the SAM2 video propagator and local feature matching methods, by up to 30% on the AUPRC metric, on ScanNet++ and Replica datasets. We further demonstrate benefits of the proposed model on relevant downstream tasks, including 3D instance mapping and object-relative navigation.


40d45b1e23d00d5895e65778e85cf8ee-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

Artificial intelligence (AI) has become a powerful tool for economic research, enabling large-scale simulation and policy optimization. However, applying AI effectively requires simulation platforms for scalable training and evaluation--yet existing environments remain limited to simplified, narrowly scoped tasks, falling short of capturing complex economic challenges such as demographic shifts, multigovernment coordination, and large-scale agent interactions. To address this gap, we introduce EconGym, a scalable and modular testbed that connects diverse economic tasks with AI algorithms. Grounded in rigorous economic modeling, EconGym implements 11 heterogeneous role types (e.g., households, firms, banks, governments), their interaction mechanisms, and agent models with well-defined observations, actions, and rewards. Users can flexibly compose economic roles with diverse agent algorithms to simulate rich multi-agent trajectories across 25+ economic tasks for AI-driven policy learning and analysis. Experiments show that EconGym supports diverse and cross-domain tasks--such as coordinating fiscal, pension, and monetary policies--and enables benchmarking across AI, economic methods, and hybrids. Results indicate that richer task composition and algorithm diversity expand the policy space, while AI agents guided by classical economic methods perform best in complex settings.


40b5237c3e025c72c02dd8b6716dac76-Paper-Conference.pdf

Neural Information Processing Systems

Graph-based recommender systems have achieved remarkable effectiveness by modeling high-order interactions between users and items. However, such approaches are significantly undermined by popularity bias, which distorts the interaction graph's structure--referred to as topology bias. This leads to overrepresentation of popular items, thereby reinforcing biases and fairness issues through the user-system feedback loop. Despite attempts to study this effect, most prior work focuses on the embedding or gradient level bias, overlooking how topology bias fundamentally distorts the message passing process itself. We bridge this gap by providing an empirical and theoretical analysis from a Dirichlet energy perspective, revealing that graph message passing inherently amplifies topology bias and consistently benefits highly connected nodes. To address these limitations, we propose Test-time Simplicial Propagation (TSP), which extends message passing to higher-order simplicial complexes. By incorporating richer structures beyond pairwise connections, TSP mitigates harmful topology bias and substantially improves the representation and recommendation of long-tail items during inference. Extensive experiments across five real-world datasets demonstrate the superiority of our approach in mitigating topology bias and enhancing recommendation quality. The implementation code is available at https://github.com/sotaagi/TSP.


GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow

Neural Information Processing Systems

Spatial transcriptomics technologies can be used to align transcriptomes with histopathological morphology, presenting exciting new opportunities for biomolecular discovery. Using spatial transcriptomic gene expression and corresponding histology data, we construct a novel framework, GeneFlow, to map single-and multi-cell gene expression onto paired cellular images. By combining an attentionbased RNA encoder with a conditional UNet guided by rectified flow, we generate high-resolution images with different staining methods (e.g., H&E, DAPI) to highlight various cellular/ tissue structures. Rectified flow with high-order ODE solvers creates a continuous, bijective mapping between expression and image manifolds, addressing the many-to-one relationship inherent in this problem. Our method enables the generation of realistic cellular morphology features and spatially resolved intercellular interactions under genetic or chemical perturbations. This enables minimally invasive disease diagnosis by revealing dysregulated patterns in imaging phenotypes. Our rectified flow based method outperforms diffusion methods and baselines in all experiments.


Causal Mixture Models: Characterization and Discovery

Neural Information Processing Systems

Real-world datasets are often a combination of unobserved subpopulations that follow distinct causal generating processes. In an observational study, for example, participants may fall into unknown groups that either (a) respond effectively to a drug, or (b) show no response due to drug resistance. Not accounting for such heterogeneity then risks biased estimates of drug effectiveness. In this work, we formulate this setting through a causal mixture model, in which the data-generating process of each variable depends on latent group membership (a or b).