Goto

Collaborating Authors

 Industry


VideoTitans: Scalable Video Prediction with Integrated Short-and Long-term Memory

Neural Information Processing Systems

Accurate video forecasting enables autonomous vehicles to anticipate hazards, robotics and surveillance systems to predict human intent, and environmental models to issue timely warnings for extreme weather events. However, existing methods remain limited: transformers rely on global attention with quadratic complexity, making them impractical for high-resolution, long-horizon video prediction, while convolutional and recurrent networks suffer from short-range receptive fields and vanishing gradients, losing key information over extended sequences. To overcome these challenges, we introduce VideoTitans, the first architecture to adapt the gradient-driven Titans memory--originally designed for language modelling to video prediction. VideoTitans integrates three core ideas: (i) a sliding-window attention core that scales linearly with sequence length and spatial resolution, (ii) an episodic memory that dynamically retains only informative tokens based on a gradient-based surprise signal, and (iii) a small set of persistent tokens encoding task-specific priors that stabilize training and enhance generalization.


903ceb0ed2d5ceec6e2c9b317b6c54a8-Paper-Conference.pdf

Neural Information Processing Systems

Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications. Despite this great success, the safety guardrail of LVLMs may not cover the unforeseen domains introduced by the visual modality. Existing studies primarily focus on eliciting LVLMs to generate harmful responses via carefully crafted image-based jailbreaks designed to bypass alignment defenses. In this study, we reveal that a safe image can be exploited to achieve the same jailbreak consequence when combined with additional safe images and prompts. This stems from two fundamental properties of LVLMs: universal reasoning capabilities and safety snowball effect. Building on these insights, we propose Safety Snowball Agent (SSA), a novel agent-based framework leveraging agents' autonomous and tool-using abilities to jailbreak LVLMs. SSAoperates through two principal stages: (1) initial response generation, where tools generate or retrieve jailbreak images based on potential harmful intents, and (2) harmful snowballing, where refined subsequent prompts induce progressively harmful outputs. Our experiments demonstrate that SSAcan use nearly any image to induce LVLMs to produce unsafe content, achieving high success jailbreaking rates against the latest LVLMs. Unlike prior works that exploit alignment flaws, SSAleverages the inherent properties of LVLMs, presenting a profound challenge for enforcing safety in generative multimodal systems.


One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models

Neural Information Processing Systems

For large language models (LLMs), sparse autoencoders (SAEs) have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpretable features, facilitating better control and subsequent analysis. However, similar analysesTextand approaches have been lacking for text-toimage models. We investigate the possibility of using SAEs to learn interpretable features for SDXLTurbo, a few-step text-to-image diffusion model. To this end, SDXL Basewe train SAEs on the updates performed by transformer blocks within SDXL 25 steps Turbo's denoising U-net in its 1-step setting. Interestingly, we find that they generalize to 4-step SDXLTurbo and even to the multi-step SDXL base model (i.e., a different model) without additional training. In addition, we show that their learned features are interpretable, causally influence the generation process, and reveal specialization among the blocks.


Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution

Neural Information Processing Systems

Diffusion models have transformed image synthesis through iterative denoising, by defining trajectories from noise to coherent data. While their capabilities are widely celebrated, a critical challenge remains unaddressed: ensuring responsible use by verifying whether an image originates from a model's training set, its novel generations or external sources. We introduce a framework that analyzes diffusion trajectories for this purpose. Specifically, we demonstrate that temporal dynamics across the entire trajectory allow for more robust classification and challenge the widely-adopted "Goldilocks zone" conjecture, which posits that membership inference is effective only within narrow denoising stages. More fundamentally, we expose critical flaws in current membership inference practices by showing that representative methods fail under distribution shifts or when model-generated data is present. For model attribution, we demonstrate a first white-box approach directly applicable to diffusion. Ultimately, we propose the unification of data provenance into a single, cohesive framework tailored to modern generative systems.


Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis

Neural Information Processing Systems

Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we propose a principled framework that adjusts data representations to balance predictive utility and fairness. Using sufficient dimension reduction, we decompose the feature space into target-relevant, sensitive, and shared components, and control the fairness-utility trade-off by selectively removing sensitive information. We provide a theoretical analysis of how prediction error and fairness gaps evolve as shared subspaces are added, and employ influence functions to quantify their effects on the asymptotic behavior of parameter estimates. Experiments on both synthetic and real-world datasets validate our theoretical insights and show that the proposed method effectively improves fairness while preserving predictive performance.


Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism

Neural Information Processing Systems

Subgraph matching in logic circuits is foundational for numerous Electronic Design Automation (EDA) applications, including datapath optimization, arithmetic verification, and hardware trojan detection. However, existing techniques rely primarily on structural graph isomorphism and thus fail to identify function-related subgraphs when synthesis transformations substantially alter circuit topology. To overcome this critical limitation, we introduce the concept of functional subgraph matching, a novel approach that identifies whether a given logic function is implicitly present within a larger circuit, irrespective of structural variations induced by synthesis or technology mapping. Specifically, we propose a two-stage multi-modal framework: (1) learning robust functional embeddings across AIG and post-mapping netlists for functional subgraph detection, and (2) identifying fuzzy boundaries using a graph segmentation approach. Evaluations on standard benchmarks (ITC99, OpenABCD, ForgeEDA) demonstrate significant performance improvements over existing structural methods, with average 93.8% accuracy in functional subgraph detection and a dice score of 91.3% in fuzzy boundary identification. The source code and implementation details can be found at our repository.


From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning

Neural Information Processing Systems

Finetuning provides a scalable and cost-effective means of customizing language models for specific tasks or response styles, with greater reliability than prompting or in-context learning. In contrast, the conventional wisdom is that injecting knowledge via finetuning results in brittle performance and poor generalization. We argue that the dichotomy of "task customization" (e.g., instruction tuning) and "knowledge injection" (e.g., teaching new facts) is a distinction without a difference. We instead identify concrete factors that explain the heterogeneous effectiveness observed with finetuning. To this end, we conduct a large-scale experimental study of finetuning the frontier Gemini v1.5 model family on a spectrum of datasets that are artificially engineered to interpolate between the strengths and failure modes of finetuning. Our findings indicate that question-answer training data formats provide much stronger knowledge generalization than document/articlestyle training data, numerical information can be harder for finetuning to retain than categorical information, and models struggle to apply finetuned knowledge during multi-step reasoning even when trained on similar examples--all factors that render "knowledge injection" to be especially difficult, even after controlling for considerations like data augmentation and information volume. On the other hand, our findings also indicate that it is not fundamentally more difficult to finetune information about a real-world event than information about writing style.



SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment

Neural Information Processing Systems

Large Reasoning Models (LRMs) have become powerful tools for complex problem solving, but their structured reasoning pathways can lead to unsafe outputs when exposed to harmful prompts. Existing safety alignment methods reduce harmful outputs but can degrade reasoning depth, leading to significant trade-offs in complex, multi-step tasks, and remain vulnerable to sophisticated jailbreak attacks. To address this, we introduce SAFEPATH, a lightweight alignment method that fine-tunes LRMs to emit a short, 8-token Safety Primer at the start of their reasoning, in response to harmful prompts, while leaving the rest of the reasoning process unsupervised. Empirical results across multiple benchmarks indicate that SAFEPATH effectively reduces harmful outputs while maintaining reasoning performance. Specifically, SAFEPATH reduces harmful responses by up to 90.0% and blocks 83.3% of jailbreak attempts in the DeepSeek-R1-Distill-Llama-8B model, while requiring 295.9x less compute than Direct Refusal and 314.1x less than SafeChain. We further introduce a zero-shot variant that requires no finetuning. In addition, we provide a comprehensive analysis of how existing methods in LLMs generalize, or fail, when applied to reasoning-centric models, revealing critical gaps and new directions for safer AI. We release model and code at https://ai-isl.github.io/safepath.


BitMark: Watermarking Bitwise Autoregressive Image Generative Models

Neural Information Processing Systems

State-of-the-art text-to-image models generate photorealistic images at an unprecedented speed. This work focuses on models that operate in a bitwise autoregressive manner over a discrete set of tokens that is practically infinite in size. However, their impressive generative power comes with a growing risk: as their outputs increasingly populate the Internet, they are likely to be scraped and reused as training data--potentially by the very same models. This phenomenon has been shown to lead to model collapse, where repeated training on generated content, especially from the models' own previous versions, causes a gradual degradation in performance. A promising mitigation strategy is watermarking, which embeds human-imperceptible yet detectable signals into generated images--enabling the identification of generated content. In this work, we introduce BitMark, a robust bitwise watermarking framework.