Goto

Collaborating Authors

 rephrase




2a50e9c2d6b89b95bcb416d6857f8b45-Reviews.html

Neural Information Processing Systems

The authors propose an efficient scheme to solve LP relaxations of combinatorial optimization problems. Their contribution is a novel scheme and analysis that takes into account the original goal of constructing an integral feasible solution from the relaxed solution. They prove that an approximate solution is sufficient to construct an integral alpha-approximate solution to the vertex cover problem. They also prove a convergence result for an algorithm solving a suitably constructed QP approximation to a general standard form LP problem. The proposed method is evaluated experimentally on a number of combinatorial optimization problems and shown to be competitive with Cplex, a state-of-the-art LP solver.




data points changes the norms of all vectors, while the norms are very important quantities in the

Neural Information Processing Systems

Shifting the data points is a good idea, but it might cause problems. In our current work, we focus on theory and datasets satisfying assumption 1. We will rephrase the sentence as follows: "In these scenarios, In the present work, we aim to improve the efficiency of the MIPS problem in algorithmic perspective. GPU can process multiple queries in parallel. Algorithm 1) and the indices of visited vertices can be arbitrarily large. We will add extra discussion in the paper and leave the details for future work. Thanks for finding out our work is interesting. It has been improving and applying to various search tasks. The goal of our paper is to fill this gap. Thank you so much for highly encouraging comments. We address your concern about the normal assumption in our response to reviewer 1. The normal assumption is indeed not necessary. We appreciate your detailed nice summary of our work. We will also change "M


SafeConstellations: Steering LLM Safety to Reduce Over-Refusals Through Task-Specific Trajectory

Maskey, Utsav, Yadav, Sumit, Dras, Mark, Naseem, Usman

arXiv.org Artificial Intelligence

LLMs increasingly exhibit over-refusal behavior, where safety mechanisms cause models to reject benign instructions that superficially resemble harmful content. This phenomena diminishes utility in production applications that repeatedly rely on common prompt templates or applications that frequently rely on LLMs for specific tasks (e.g. sentiment analysis, language translation). Through comprehensive evaluation, we demonstrate that LLMs still tend to refuse responses to harmful instructions when those instructions are reframed to appear as benign tasks. Our mechanistic analysis reveal that LLMs follow distinct "constellation" patterns in embedding space as representations traverse layers, with each task maintaining consistent trajectories that shift predictably between refusal and non-refusal cases. We introduce SafeConstellations, an inference-time trajectory-shifting approach that tracks task-specific trajectory patterns and guides representations toward non-refusal pathways. By selectively guiding model behavior only on tasks prone to over-refusal, and by preserving general model behavior, our method reduces over-refusal rates by up to 73% with minimal impact on utility-offering a principled approach to mitigating over-refusals.


ba95d78a7c942571185308775a97a3a0-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank the reviewers for their constructive comments. Below, we try to respond to their main comments. Note that each motif exhibits some distinct properties and can be considered as a graph-feature. With this in mind, we are not sure if it is worth adding this baseline to the paper. We will make this clear in the revised manuscript.


Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework

Kunievsky, Nadav, Evans, James A.

arXiv.org Artificial Intelligence

Understanding whether large language models (LLMs) possess a world model-a structured understanding of the world that supports generalization beyond surface-level patterns-is central to assessing their reliability, especially in high-stakes applications. We propose a formal framework for evaluating whether an LLM exhibits a sufficiently robust world model, defined as producing consistent outputs across semantically equivalent prompts while distinguishing between prompts that express different intents. We introduce a new evaluation approach to measure this that decomposes model response variability into three components: variability due to user purpose, user articulation, and model instability. An LLM with a strong world model should attribute most of the variability in its responses to changes in foundational purpose rather than superficial changes in articulation. This approach allows us to quantify how much of a model's behavior is semantically grounded rather than driven by model instability or alternative wording. We apply this framework to evaluate LLMs across diverse domains. Our results show how larger models attribute a greater share of output variability to changes in user purpose, indicating a more robust world model. This improvement is not uniform, however: larger models do not consistently outperform smaller ones across all domains, and their advantage in robustness is often modest. These findings highlight the importance of moving beyond accuracy-based benchmarks toward semantic diagnostics that more directly assess the structure and stability of a model's internal understanding of the world.


STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings

Rastogi, Saksham, Maini, Pratyush, Pruthi, Danish

arXiv.org Artificial Intelligence

Given how large parts of publicly available text are crawled to pretrain large language models (LLMs), data creators increasingly worry about the inclusion of their proprietary data for model training without attribution or licensing. Their concerns are also shared by benchmark curators whose test-sets might be compromised. In this paper, we present STAMP, a framework for detecting dataset membership-i.e., determining the inclusion of a dataset in the pretraining corpora of LLMs. Given an original piece of content, our proposal involves first generating multiple rephrases, each embedding a watermark with a unique secret key. One version is to be released publicly, while others are to be kept private. Subsequently, creators can compare model likelihoods between public and private versions using paired statistical tests to prove membership. We show that our framework can successfully detect contamination across four benchmarks which appear only once in the training data and constitute less than 0.001% of the total tokens, outperforming several contamination detection and dataset inference baselines. We verify that STAMP preserves both the semantic meaning and utility of the original data. We apply STAMP to two real-world scenarios to confirm the inclusion of paper abstracts and blog articles in the pretraining corpora.