Goto

Collaborating Authors

 Cohen-Addad, Vincent


REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective

arXiv.org Artificial Intelligence

To circumvent the alignment of large language models (LLMs), current optimization-based adversarial attacks usually craft adversarial prompts by maximizing the likelihood of a so-called affirmative response. An affirmative response is a manually designed start of a harmful answer to an inappropriate request. While it is often easy to craft prompts that yield a substantial likelihood for the affirmative response, the attacked model frequently does not complete the response in a harmful manner. Moreover, the affirmative objective is usually not adapted to model-specific preferences and essentially ignores the fact that LLMs output a distribution over responses. If low attack success under such an objective is taken as a measure of robustness, the true robustness might be grossly overestimated. To alleviate these flaws, we propose an adaptive and semantic optimization problem over the population of responses. We derive a generally applicable objective via the REINFORCE policy-gradient formalism and demonstrate its efficacy with the state-of-the-art jailbreak algorithms Greedy Coordinate Gradient (GCG) and Projected Gradient Descent (PGD). For example, our objective doubles the attack success rate (ASR) on Llama3 and increases the ASR from 2% to 50% with circuit breaker defense.


The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence

arXiv.org Artificial Intelligence

The safety alignment of large language models (LLMs) can be circumvented through adversarially crafted inputs, yet the mechanisms by which these attacks bypass safety barriers remain poorly understood. Prior work suggests that a single refusal direction in the model's activation space determines whether an LLM refuses a request. In this study, we propose a novel gradient-based approach to representation engineering and use it to identify refusal directions. Contrary to prior work, we uncover multiple independent directions and even multi-dimensional concept cones that mediate refusal. Moreover, we show that orthogonality alone does not imply independence under intervention, motivating the notion of representational independence that accounts for both linear and non-linear effects. Using this framework, we identify mechanistically independent refusal directions. We show that refusal mechanisms in LLMs are governed by complex spatial structures and identify functionally independent directions, confirming that multiple distinct mechanisms drive refusal behavior. Our gradient-based approach uncovers these mechanisms and can further serve as a foundation for future work on understanding LLMs.


A Tight VC-Dimension Analysis of Clustering Coresets with Applications

arXiv.org Artificial Intelligence

We consider coresets for $k$-clustering problems, where the goal is to assign points to centers minimizing powers of distances. A popular example is the $k$-median objective $\sum_{p}\min_{c\in C}dist(p,C)$. Given a point set $P$, a coreset $\Omega$ is a small weighted subset that approximates the cost of $P$ for all candidate solutions $C$ up to a $(1\pm\varepsilon )$ multiplicative factor. In this paper, we give a sharp VC-dimension based analysis for coreset construction. As a consequence, we obtain improved $k$-median coreset bounds for the following metrics: Coresets of size $\tilde{O}\left(k\varepsilon^{-2}\right)$ for shortest path metrics in planar graphs, improving over the bounds $\tilde{O}\left(k\varepsilon^{-6}\right)$ by [Cohen-Addad, Saulpic, Schwiegelshohn, STOC'21] and $\tilde{O}\left(k^2\varepsilon^{-4}\right)$ by [Braverman, Jiang, Krauthgamer, Wu, SODA'21]. Coresets of size $\tilde{O}\left(kd\ell\varepsilon^{-2}\log m\right)$ for clustering $d$-dimensional polygonal curves of length at most $m$ with curves of length at most $\ell$ with respect to Frechet metrics, improving over the bounds $\tilde{O}\left(k^3d\ell\varepsilon^{-3}\log m\right)$ by [Braverman, Cohen-Addad, Jiang, Krauthgamer, Schwiegelshohn, Toftrup, and Wu, FOCS'22] and $\tilde{O}\left(k^2d\ell\varepsilon^{-2}\log m \log |P|\right)$ by [Conradi, Kolbe, Psarros, Rohde, SoCG'24].


Dynamic Correlation Clustering in Sublinear Update Time

arXiv.org Artificial Intelligence

Clustering is a cornerstone of contemporary machine learning and data analysis. A successful clustering algorithm partitions data elements so that similar items reside within the same group, while dissimilar items are separated. Introduced in 2004 by Bansal, Blum and Chawla Bansal et al. ((2004)), the correlation clustering objective offers a natural approach to model this problem. Due to its concise and elegant formulation, this problem has drawn significant interest from researchers and practitioners, leading to applications across diverse domains. These include ensemble clustering identification ((Bonchi et al., 2013)), duplicate detection ((Arasu et al., 2009)), community mining ((Chen et al., 2012)), disambiguation tasks ((Kalashnikov et al., 2008)), automated labeling ((Agrawal et al., 2009; Chakrabarti et al., 2008)), and many more. In the correlation clustering problem we are given a graph where each edge has either a positive or negative label, and where a positive edge (u, v) indicates that u, v are similar elements (and a negative edge (u, v) indicates that u, v are dissimilar), the objective is to compute a partition of the graph that minimizes the number of negative edges within clusters plus positive edges between clusters. Since the problem is NP-hard, researchers have focused on designing approximation algorithms. The algorithm proposed by Cao et al. ((2024)) achieves an approximation ratio of 1.43 + ϵ, improving upon the previous 1.73 + ϵ and 1.994 + ϵ achieved by Cohen-Addad et al. ((2023, 2022b)). Prior to these developments, the best approximation guarantee of 2.06 was attained by the algorithm of Chawla et al. ((2015)).


Perturb-and-Project: Differentially Private Similarities and Marginals

arXiv.org Artificial Intelligence

We revisit the input perturbations framework for differential privacy where noise is added to the input $A\in \mathcal{S}$ and the result is then projected back to the space of admissible datasets $\mathcal{S}$. Through this framework, we first design novel efficient algorithms to privately release pair-wise cosine similarities. Second, we derive a novel algorithm to compute $k$-way marginal queries over $n$ features. Prior work could achieve comparable guarantees only for $k$ even. Furthermore, we extend our results to $t$-sparse datasets, where our efficient algorithms yields novel, stronger guarantees whenever $t\le n^{5/6}/\log n\,.$ Finally, we provide a theoretical perspective on why \textit{fast} input perturbation algorithms works well in practice. The key technical ingredients behind our results are tight sum-of-squares certificates upper bounding the Gaussian complexity of sets of solutions.


A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

arXiv.org Artificial Intelligence

We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC'12], where, given a random bipartite graph with $\alpha$ edges and an unknown bipartition $(A, B)$ of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut $(A, B)$ (i.e. all adversarial changes are \textit{monotone} with respect to the bipartition). For this model, a polynomial time algorithm is known to approximate the Balanced Cut problem up to value $O(\alpha)$ [MMV'12] as long as the cut $(A, B)$ has size $\Omega(\alpha)$. However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV'12]. Our algorithm runs in time $O(|V(G)|^{1+o(1)} + |E(G)|^{1+o(1)})$ and finds a balanced cut of value $O(\alpha)$. Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time $O(1)$-approximation to Dagupta's objective function for hierarchical clustering [Dasgupta, STOC'16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM'19].


Multi-View Stochastic Block Models

arXiv.org Machine Learning

Graph clustering is a central topic in unsupervised learning with a multitude of practical applications. In recent years, multi-view graph clustering has gained a lot of attention for its applicability to real-world instances where one has access to multiple data sources. In this paper we formalize a new family of models, called \textit{multi-view stochastic block models} that captures this setting. For this model, we first study efficient algorithms that naively work on the union of multiple graphs. Then, we introduce a new efficient algorithm that provably outperforms previous approaches by analyzing the structure of each graph separately. Furthermore, we complement our results with an information-theoretic lower bound studying the limits of what can be done in this model. Finally, we corroborate our results with experimental evaluations.


Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

arXiv.org Artificial Intelligence

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and sensitivity sampling. Assuming access to an embedding representation of the data with respect to which the model loss is H\"older continuous, our approach provably allows selecting a set of ``typical'' $k + 1/\varepsilon^2$ elements whose average loss corresponds to the average loss of the whole dataset, up to a multiplicative $(1\pm\varepsilon)$ factor and an additive $\varepsilon \lambda \Phi_k$, where $\Phi_k$ represents the $k$-means cost for the input embeddings and $\lambda$ is the H\"older constant. We furthermore demonstrate the performance and scalability of our approach on fine-tuning foundation models and show that it outperforms state-of-the-art methods. We also show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performances of leverage score sampling, while being conceptually simpler and more scalable.


A Scalable Algorithm for Individually Fair K-means Clustering

arXiv.org Artificial Intelligence

We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $\delta(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $\delta(x)$ of $x$ for each $x\in P$. While good approximation algorithms are known for this problem no efficient practical algorithms with good theoretical guarantees have been presented. We design the first fast local-search algorithm that runs in ~$O(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Then we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.


A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies

arXiv.org Machine Learning

Multi-dimensional Scaling (MDS) is a family of methods for embedding pair-wise dissimilarities between $n$ objects into low-dimensional space. MDS is widely used as a data visualization tool in the social and biological sciences, statistics, and machine learning. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities $\{d_{i,j}\}_{i , j \in [n]}$ over $n$ points, the goal is to find an embedding $\{x_1,\dots,x_n\} \subset \mathbb{R}^k$ that minimizes \[ \text{OPT} = \min_{x} \mathbb{E}_{i,j \in [n]} \left[ \left(1-\frac{\|x_i - x_j\|}{d_{i,j}}\right)^2 \right] \] Despite its popularity, our theoretical understanding of MDS is extremely limited. Recently, Demaine, Hesterberg, Koehler, Lynch, and Urschel (arXiv:2109.11505) gave the first approximation algorithm with provable guarantees for Kamada-Kawai, which achieves an embedding with cost $\text{OPT} +\epsilon$ in $n^2 \cdot 2^{\tilde{\mathcal{O}}(k \Delta^4 / \epsilon^2)}$ time, where $\Delta$ is the aspect ratio of the input dissimilarities. In this work, we give the first approximation algorithm for MDS with quasi-polynomial dependency on $\Delta$: for target dimension $k$, we achieve a solution with cost $\mathcal{O}(\text{OPT}^{ \hspace{0.04in}1/k } \cdot \log(\Delta/\epsilon) )+ \epsilon$ in time $n^{ \mathcal{O}(1)} \cdot 2^{\tilde{\mathcal{O}}( k^2 (\log(\Delta)/\epsilon)^{k/2 + 1} ) }$. Our approach is based on a novel analysis of a conditioning-based rounding scheme for the Sherali-Adams LP Hierarchy. Crucially, our analysis exploits the geometry of low-dimensional Euclidean space, allowing us to avoid an exponential dependence on the aspect ratio $\Delta$. We believe our geometry-aware treatment of the Sherali-Adams Hierarchy is an important step towards developing general-purpose techniques for efficient metric optimization algorithms.