Goto

Collaborating Authors

 Technology


Ascent Fails to Forget

Neural Information Processing Systems

Contrary to common belief, we show that gradient ascent-based unconstrained optimization methods frequently fail to perform machine unlearning, a phenomenon we attribute to the inherent statistical dependence between the forget and retain data sets. This dependence, which can manifest itself even as simple correlations, undermines the misconception that these sets can be independently manipulated during unlearning. We provide empirical and theoretical evidence showing these methods often fail precisely due to this overlooked relationship. For random forget sets, this dependence means that degrading forget set metrics (which, for the oracle, should mirror test set metrics) inevitably harms overall test performance. Going beyond random sets, we consider logistic regression as an instructive example where a critical failure mode emerges: inter-set dependence causes gradient descentascent iterations to progressively diverge from the oracle. Strikingly, these methods can converge to solutions that are not only far from the oracle but are potentially even further from it than the original model itself, rendering the unlearning process actively detrimental. A toy example further illustrates how this dependence can trap models in inferior local minima, inescapable via finetuning. Our findings highlight that the presence of such statistical dependencies, even when manifest only as correlations, can be sufficient for ascent-based unlearning to fail. Our theoretical insights are corroborated by experiments on complex neural networks, demonstrating that these methods do not perform as expected in practice due to this unaddressed statistical interplay.


One Head to Rule Them All: Amplifying LVLMSafety through a Single Critical Attention Head

Neural Information Processing Systems

Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in tasks requiring multimodal understanding. However, recent studies indicate that LVLMs are more vulnerable than LLMs to unsafe inputs and prone to generating harmful content. Existing defense strategies primarily include fine-tuning, input sanitization, and output intervention. Although these approaches provide a certain level of protection, they tend to be resource-intensive and struggle to effectively counter sophisticated attack techniques. To tackle such issues, we propose One-head Defense (Oh Defense), a novel yet simple approach utilizing LVLMs' internal safety capabilities. Through systematic analysis of the attention mechanisms, we discover that LVLMs' safety capabilities are concentrated within specific attention heads that respond differently to safe or unsafe inputs. Further exploration reveals that a single critical attention head can effectively serve as a safety guard, providing a strong discriminative signal that amplifies the model's inherent safety capabilities. Hence, the Oh Defense requires no additional training or external modules, making it computationally efficient while effectively reactivating suppressed safety mechanisms. Extensive experiments across diverse LVLM architectures and unsafe datasets validate our approach, i.e., the Oh Defense achieves near-perfect defense success rates (> 98%) for unsafe inputs while maintaining low false positive rates (< 5%) for safe content.


Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Neural Information Processing Systems

Long-horizon video-audio reasoning and fine-grained pixel understanding impose conflicting requirements on omnimodal models: dense temporal coverage demands many low-resolution frames, whereas precise grounding calls for highresolution inputs. We tackle this trade-off with a two-system architecture: a Global Reasoning System selects informative keyframes and rewrites the task at low spatial cost, while a Detail Understanding System performs pixel-level grounding on the selected high-resolution snippets. Because "optimal" keyframe selection and reformulation are ambiguous and hard to supervise, we formulate them as a reinforcement-learning (RL) problem and present Omni-R1, an end-to-end RL framework built on Group Relative Policy Optimization. Omni-R1 trains the Global Reasoning System through hierarchical rewards obtained via online collaboration with the Detail Understanding System, requiring only one epoch of RL on small task splits. Experiments on two challenging benchmarks, Referring Audio-Visual Segmentation (RefAVS) and Reasoning Video Object Segmentation (REVOS), show that Omni-R1 not only surpasses strong supervised baselines but also outperforms specialized state-of-the-art models, while substantially improving out-of-domain generalization and mitigating multimodal hallucination. Our results demonstrate the first successful application of RL to large-scale omnimodal reasoning and highlight a scalable path toward universally foundation models.


Interpreta view of the lighthouseandsky person works at his desk in officedifferent concepts(a)(b)(c)Vision RepresentationLanguage RepresentationConcept Activationthe same concept

Neural Information Processing Systems

However, the interpretability of the alignment component remains uninvestigated due to the difficulty in mapping the semantics of multi-modal representations into a unified concept set. To address this problem, we propose VL-SAE, a sparse autoencoder that encodes vision-language representations into its hidden activations. Each neuron in its hidden layer correlates to a concept represented by semantically similar images and texts, thereby interpreting these representations with a unified concept set. To establish the neuron-concept correlation, we encourage semantically similar representations to exhibit consistent neuron activations during self-supervised training. First, to measure the semantic similarity of multi-modal representations, we perform their alignment in an explicit form based on cosine similarity. Second, we construct the VL-SAE with a distance-based encoder and two modality-specific decoders to ensure the activation consistency of semantically similar representations. Experiments across multiple VLMs (e.g., CLIP, LLaVA) demonstrate the superior capability of VL-SAE in interpreting and enhancing the vision-language alignment. For interpretation, the alignment between vision and language representations can be understood by comparing their semantics with concepts. For enhancement, the alignment can be strengthened by aligning vision-language representations at the concept level, contributing to performance improvements in downstream tasks, including zero-shot image classification and hallucination elimination.


AVROBUSTBENCH: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time Sarthak Kumar Maharana Saksham Singh Kushwaha Baoming Zhang Adrian Rodriguez Songtao Wei Yapeng Tian

Neural Information Processing Systems

AVROBUSTBENCH comprises four audio-visual benchmark datasets, AUDIOSET-2C, VGGSOUND-2C, KINETICS-2C, and EPICKITCHENS-2C, each incorporating 75 bimodal audio-visual corruptions that are co-occurring and correlated. Through extensive evaluations, we observe that state-of-the-art supervised and severity self-supervised increases.


Differential Privacy for Euclidean Jordan Algebra with Applications to Private Symmetric Cone Programming

Neural Information Processing Systems

In this paper, we study differentially private mechanisms for functions whose outputs lie in a Euclidean Jordan algebra. Euclidean Jordan algebras capture many important mathematical structures and form the foundation of linear programming, second-order cone programming, and semidefinite programming. Our main contribution is a generic Gaussian mechanism for such functions, with sensitivity measured in โ„“2, โ„“1, and โ„“ norms. Notably, this framework includes the important case where the function outputs are symmetric matrices, and sensitivity is measured in the Frobenius, nuclear, or spectral norm. We further derive private algorithms for solving symmetric cone programs under various settings, using a combination of the multiplicative weights update method and our generic Gaussian mechanism. As an application, we present differentially private algorithms for semidefinite programming, resolving a major open question posed by [Hsu, Roth, Roughgarden, and Ullman, ICALP 2014].


ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

Neural Information Processing Systems

With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing opensource frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform.


Why the Reflecting Pool Is Full of Algae After Trump's Renovation

WIRED

Why the Reflecting Pool Is Full of Algae After Trump's Renovation Warm weather has fueled a bloom that US National Park Service workers are trying to kill using everything from hydrogen peroxide to nanobubbles ahead of July 4 celebrations. On Wednesday morning, workers poured hydrogen peroxide into the Lincoln Memorial Reflecting Pool in Washington, DC. The treatment is the latest attempt by the Interior Department to control an algae bloom that has turned the pool bright green, despite President Donald Trump's costly renovation to make it "American flag blue" in time for the nation's 250th anniversary . Hot temperatures and climate change are among the risk factors that could be driving the outbreak. The Trump administration spent more than $14 million to update the pool ahead of celebrations across the US capital .


OS-HARM: ABenchmark for Measuring Safety of Computer Use Agents

Neural Information Processing Systems

Computer use agents are LLM-based agents that can directly interact with a graphical user interface, by processing screenshots or accessibility trees. While these systems are gaining popularity, their safety has been largely overlooked, despite the fact that evaluating and understanding their potential for harmful behavior is essential for widespread adoption. To address this gap, we introduce OS-HARM, a new benchmark for measuring safety of computer use agents. OS-HARM is built on top of the OSWorld environment (Xie et al., 2024) and aims to test models across three categories of harm: deliberate user misuse, prompt injection attacks, and model misbehavior.


A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1

Neural Information Processing Systems

Despite promising performance on open-source large vision-language models (LVLMs), transfer-based targeted attacks often fail against closed-source commercial LVLMs. Analyzing failed adversarial perturbations reveals that the learned perturbations typically originate from a uniform distribution and lack clear semantic details, resulting in unintended responses. This critical absence of semantic information leads commercial black-box LVLMs to either ignore the perturbation entirely or misinterpret its embedded semantics, thereby causing the attack to fail. To overcome these issues, we propose to refine semantic clarity by encoding explicit semantic details within local regions, thus ensuring the capture of finer-grained features and inter-model transferability, and by concentrating modifications on semantically rich areas rather than applying them uniformly. To achieve this, we propose *a simple yet highly effective baseline*: at each optimization step, the adversarial image is cropped randomly by a controlled aspect ratio and scale, resized, and then aligned with the target image in the embedding space. While the naive source-target matching method has been utilized before in the literature, we are the first to provide a tight analysis, which establishes a close connection between perturbation optimization and semantics. Experimental results confirm our hypothesis. Our adversarial examples crafted with local-aggregated perturbations focused on crucial regions exhibit surprisingly good transferability to commercial LVLMs, including GPT-4.5, GPT-4o, Gemini-2.0-flash,