Goto

Collaborating Authors

 Industry


Ref. ImagesOursGTPaint-by-Example Target Images

Neural Information Processing Systems

Reference-driven image completion, which restores missing regions in a target view using additional images, is particularly challenging when the target view differs significantly from the references. Existing generative methods rely solely on diffusion priors and, without geometric cues such as camera pose or depth, often produce misaligned or implausible content. We propose GeoComplete, a novel framework that incorporates explicit 3D structural guidance to enforce geometric consistency in the completed regions, setting it apart from prior image-only approaches. GeoComplete introduces two key ideas: conditioning the diffusion process on projected point clouds to infuse geometric information, and applying target-aware masking to guide the model toward relevant reference cues. The framework features a dual-branch diffusion architecture.


PUO-Bench: APanel Understanding and Operation Benchmark with APrivacy-Preserving Framework

Neural Information Processing Systems

Recent advancements in Vision-Language Models (VLMs) have enabled GUI agents to leverage visual features for interface understanding and operation in the digital world. However, limited research has addressed the interpretation and interaction with control panels in real-world settings. To bridge this gap, we propose the Panel Understanding and Operation (PUO) benchmark, comprising annotated panel images from appliances and associated vision-language instruction pairs. Experimental results on the benchmark demonstrate significant performance disparities between zero-shot and fine-tuned VLMs, revealing the lack of PUOspecific capabilities in existing language models. Furthermore, we introduce a Privacy-Preserving Framework (PPF) to address privacy concerns in cloud-based panel parsing and reasoning. PPF employs a dual-stage architecture, performing panel understanding on edge devices while delegating complex reasoning to cloudbased LLMs. Although this design introduces a performance trade-off due to edge model limitations, it eliminates the transmission of raw visual data, thereby mitigating privacy risks. Overall, this work provides foundational resources and methodologies for advancing interactive human-machine systems and robotic field in panel-centric applications.


3255a7554605a88800f4e120b3a929e1-Paper-Conference.pdf

Neural Information Processing Systems

Large language models (LLMs) frequently generate hallucinations--content that deviates from factual accuracy or provided context--posing challenges for diagnosis due to the complex interplay of underlying causes. This paper introduces a subsequence association framework to systematically trace and understand hallucinations. Our key insight is that hallucinations arise when dominant hallucinatory associations outweigh faithful ones. Through theoretical and empirical analyses, we demonstrate that decoder-only transformers effectively function as subsequence embedding models, with linear layers encoding input-output associations. We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts. Experiments show our method outperforms standard attribution techniques in identifying hallucination causes and aligns with evidence from the model's training corpus. This work provides a unified perspective on hallucinations and a robust framework for their tracing and analysis.


KOALA++: Efficient Kalman-Based Optimization with Gradient-Covariance Products

Neural Information Processing Systems

We propose KOALA++, a scalable Kalman-based optimization algorithm that explicitly models structured gradient uncertainty in neural network training. Unlike second-order methods, which rely on expensive second order gradient calculation, our method directly estimates the parameter covariance matrix by recursively updating compact gradient covariance products. This design improves upon the original KOALA framework that assumed diagonal covariance by implicitly capturing richer uncertainty structure without storing the full covariance matrix and avoiding large matrix inversions. Across diverse tasks, including image classification and language modeling, KOALA++ achieves accuracy on par or better than state-of-the-art first-and second-order optimizers while maintaining the efficiency of first-order methods.


Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Neural Information Processing Systems

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure mode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales.


SAS: Simulated Attention Score

Neural Information Processing Systems

The attention mechanism is a core component of the Transformer architecture. Various methods have been developed to compute attention scores, including multihead attention (MHA), multi-query attention, group-query attention and so on. We further analyze the MHA and observe that its performance improves as the number of attention heads increases, provided the hidden size per head remains sufficiently large. Therefore, increasing both the head count and hidden size per head with minimal parameter overhead can lead to significant performance gains at a low cost. Motivated by this insight, we introduce Simulated Attention Score (SAS), which maintains a compact model size while simulating a larger number of attention heads and hidden feature dimension per head. This is achieved by projecting a low-dimensional head representation into a higher-dimensional space, effectively increasing attention capacity without increasing parameter count. Beyond the head representations, we further extend the simulation approach to feature dimension of the key and query embeddings, enhancing expressiveness by mimicking the behavior of a larger model while preserving the original model size. To control the parameter cost, we also propose Parameter-Efficient Attention Aggregation (PEAA). Comprehensive experiments on a variety of datasets and tasks demonstrate the effectiveness of the proposed SAS method, achieving significant improvements over different attention variants.


Constrained Diffusers for Safe Planning and Control

Neural Information Processing Systems

Diffusion models have shown remarkable potential in planning and control tasks due to their ability to represent multimodal distributions over actions and trajectories. However, ensuring safety under constraints remains a critical challenge for diffusion models. This paper proposes Constrained Diffusers, an extended framework for planning and control that incorporates distribution-level constraints into pretrained diffusion models without retraining or architectural modifications. Inspired by constrained optimization, we apply a constrained Langevin sampling method for the reverse diffusion process that jointly optimizes the trajectory and achieves constraint satisfaction through three iterative algorithms: projected method, primaldual method and augmented Lagrangian method. In addition, we incorporate discrete control barrier functions as constraints for constrained diffusers to guarantee safety in online implementation, following a receding-horizon control that we generate a short-horizon plan and execute only the first action before replanning. Experiments in Maze2D, locomotion, and PyBullet ball running tasks demonstrate that our proposed methods achieve constraint satisfaction with less computation time, and are competitive with existing methods in environments with static and time-varying constraints. The implementation can be found here.


DOJ Lawyers Argue xAI Is 'Vital' for National Security in NAACP Lawsuit

WIRED

DOJ Lawyers Argue xAI Is'Vital' for National Security in NAACP Lawsuit In a bid to dismiss a lawsuit over xAI's polluting gas turbines, the Justice Department claimed the company is integral to military operations--including the Iran War. The Department of Justice intervened in a lawsuit over xAI's gas turbines on Monday. In a filing, the agency sided with Elon Musk's company, saying attempts to stop xAI from running the natural gas turbines "threatens American national, economic, and energy security by seeking to shut off the power supply for artificial-intelligence innovation that supports the Department of War's military operations." The DOJ, along with xAI and the state of Mississippi, asked the court to dismiss the suit, filed by the NAACP in April. The NAACP alleges xAI isn't following the Clean Air Act and is endangering public health by running unpermitted natural gas turbines at the site of its second data center in Southaven, Mississippi, dubbed Colossus 2. In May, the NAACP filed a request for a preliminary injunction to stop xAI from running the turbines, alleging that their continued use without a permit "increases risks of asthma attacks and heart disease" in communities with an already heavy pollution burden .


Accelerating 3DMolecule Generative Models with Trajectory Diagnosis

Neural Information Processing Systems

Geometric molecule generative models have found expanding applications across various scientific domains, but their generation inefficiency has become a critical bottleneck. Through a systematic investigation of the generative trajectory, we discover a unique challenge for molecule geometric graph generation: generative models require determining the permutation order of atoms in the molecule before refining its atomic feature values. Based on this insight, we decompose the generation process into permutation phase and adjustment phase, and propose a geometric-informed prior and consistency parameter objective to accelerate each phase. Extensive experiments demonstrate that our approach achieves competitive performance with approximately 10 sampling steps, 7.5 faster than previous state-of-the-art models and approximately 100 faster than diffusion-based models, offering a significant step towards scalable molecular generation.


Correcting misinterpretations of additive models

Neural Information Processing Systems

Correct model interpretation in high-stakes settings is critical, yet both post-hoc feature attribution methods and so-called intrinsically interpretable models can systematically attribute false-positive importance to non-informative features such as suppressor variables. Specifically, both linear models and their powerful nonlinear generalisation such as General Additive Models (GAMs) are susceptible to spurious attributions to suppressors. We present a principled generalisation of activation patterns - originally developed to make linear models interpretable - to additive models, correctly rejecting suppressor effects for non-linear features. This yields PatternGAM, an importance attribution method based on univariate generative surrogate models for the broad family of additive models, and PatternQLR for polynomial models. Empirical evaluations on the XAI-TRIS benchmark with a novel false-negative invariant formulation of the earth mover's distance accuracy metric demonstrates significant improvements over popular feature attribution methods and the traditional interpretation of additive models. Finally, real-world case studies on the COMPAS and MIMIC-IV datasets provide new insights into the role of specific features by disentangling genuine target-related information from suppression effects that would mislead conventional GAM interpretations.