Singer, Yaron
Adversarial Reasoning at Jailbreaking Time
Sabbaghi, Mahdi, Kassianik, Paul, Pappas, George, Singer, Yaron, Karbasi, Amin, Hassani, Hamed
As large language models (LLMs) are becoming more capable and widespread, the study of their failure cases is becoming increasingly important. Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks. In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs. We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation that achieves SOTA attack success rates (ASR) against many aligned LLMs, even the ones that aim to trade inference-time compute for adversarial robustness. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Mehrotra, Anay, Zampetakis, Manolis, Kassianik, Paul, Nelson, Blaine, Anderson, Hyrum, Singer, Yaron, Karbasi, Amin
While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thoughts reasoning until one of the generated prompts jailbreaks the target. Crucially, before sending prompts to the target, TAP assesses them and prunes the ones unlikely to result in jailbreaks. Using tree-of-thought reasoning allows TAP to navigate a large search space of prompts and pruning reduces the total number of queries sent to the target. In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) for more than 80% of the prompts using only a small number of queries. This significantly improves upon the previous state-of-the-art black-box method for generating jailbreaks.
An Optimal Elimination Algorithm for Learning a Best Arm
Hassidim, Avinatan, Kupfer, Ron, Singer, Yaron
We consider the classic problem of $(\epsilon,\delta)$-PAC learning a best arm where the goal is to identify with confidence $1-\delta$ an arm whose mean is an $\epsilon$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst-case sample complexity is not well understood. In this paper, we propose a new approach for $(\epsilon,\delta)$-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(\epsilon,\delta)$-learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound. More specifically:
The FAST Algorithm for Submodular Maximization
Breuer, Adam, Balkanski, Eric, Singer, Yaron
In this paper we describe a new algorithm called Fast Adaptive Sequencing Technique (FAST) for maximizing a monotone submodular function under a cardinality constraint $k$ whose approximation ratio is arbitrarily close to $1-1/e$, is $O(\log(n) \log^2(\log k))$ adaptive, and uses a total of $O(n \log\log(k))$ queries. Recent algorithms have comparable guarantees in terms of asymptotic worst case analysis, but their actual number of rounds and query complexity depend on very large constants and polynomials in terms of precision and confidence, making them impractical for large data sets. Our main contribution is a design that is extremely efficient both in terms of its non-asymptotic worst case query complexity and number of rounds, and in terms of its practical runtime. We show that this algorithm outperforms any algorithm for submodular maximization we are aware of, including hyper-optimized parallel versions of state-of-the-art serial algorithms, by running experiments on large data sets. These experiments show FAST is orders of magnitude faster than the state-of-the-art.
Predicting Choice with Set-Dependent Aggregation
Rosenfeld, Nir, Oshiba, Kojin, Singer, Yaron
Providing users with alternatives to choose from is an essential component in many online platforms, making the accurate prediction of choice vital to their success. A renewed interest in learning choice models has led to significant progress in modeling power, but most current methods are either limited in the types of choice behavior they capture, cannot be applied to large-scale data, or both. Here we propose a learning framework for predicting choice that is accurate, versatile, theoretically grounded, and scales well. Our key modeling point is that to account for how humans choose, predictive models must capture certain set-related invariances. Building on recent results in economics, we derive a class of models that can express any behavioral choice pattern, enjoy favorable sample complexity guarantees, and can be efficiently trained end-to-end. Experiments on three large choice datasets demonstrate the utility of our approach.
Robust Attacks against Multiple Classifiers
Perdomo, Juan C., Singer, Yaron
We address the challenge of designing optimal adversarial noise algorithms for settings where a learner has access to multiple classifiers. We demonstrate how this problem can be framed as finding strategies at equilibrium in a two-player, zero-sum game between a learner and an adversary. In doing so, we illustrate the need for randomization in adversarial attacks. In order to compute Nash equilibrium, our main technical focus is on the design of best response oracles that can then be implemented within a Multiplicative Weights Update framework to boost deterministic perturbations against a set of models into optimal mixed strategies. We demonstrate the practical effectiveness of our approach on a series of image classification tasks using both linear classifiers and deep neural networks.
Robust Influence Maximization for Hyperparametric Models
Kalimeris, Dimitris, Kaplun, Gal, Singer, Yaron
In this paper we study the problem of robust influence maximization in the independent cascade model under a hyperparametric assumption. In social networks users influence and are influenced by individuals with similar characteristics and as such they are associated with some features. A recent surging research direction in influence maximization focuses on the case where the edge probabilities on the graph are not arbitrary but are generated as a function of the features of the users and a global hyperparameter. We propose a model where the objective is to maximize the worst-case number of influenced users for any possible value of that hyperparameter. We provide theoretical results showing that proper robust solution in our model is NP-hard and an algorithm that achieves improper robust optimization. We make-use of sampling based techniques and of the renowned multiplicative weight updates algorithm. Additionally we validate our method empirically and prove that it outperforms the state-of-the-art robust influence maximization techniques.
Fast Parallel Algorithms for Feature Selection
Qian, Sharon, Singer, Yaron
In this paper, we analyze a fast parallel algorithm to efficiently select and build a set of $k$ random variables from a large set of $n$ candidate elements. This combinatorial optimization problem can be viewed in the context of feature selection for the prediction of a response variable. Using the adaptive sampling technique, which has recently been shown to exponentially speed up submodular maximization algorithms, we propose a new parallelizable algorithm that dramatically speeds up previous selection algorithms by reducing the number of rounds from $\mathcal O(k)$ to $\mathcal O(\log n)$ for objectives that do not conform to the submodularity property. We introduce a new metric to quantify the closeness of the objective function to submodularity and analyze the performance of adaptive sampling under this regime. We also conduct experiments on synthetic and real datasets and show that the empirical performance of adaptive sampling on not-submodular objectives greatly outperforms its theoretical lower bound. Additionally, the empirical running time drastically improved in all experiments without comprising the terminal value, showing the practicality of adaptive sampling.
Non-monotone Submodular Maximization in Exponentially Fewer Iterations
Balkanski, Eric, Breuer, Adam, Singer, Yaron
In this paper we consider parallelization for applications whose objective can be expressed as maximizing a non-monotone submodular function under a cardinality constraint. Our main result is an algorithm whose approximation is arbitrarily close to 1/2e in O(log^2 n) adaptive rounds, where n is the size of the ground set. This is an exponential speedup in parallel running time over any previously studied algorithm for constrained non-monotone submodular maximization. Beyond its provable guarantees, the algorithm performs well in practice. Specifically, experiments on traffic monitoring and personalized data summarization applications show that the algorithm finds solutions whose values are competitive with state-of-the-art algorithms while running in exponentially fewer parallel iterations.
Optimization for Approximate Submodularity
Singer, Yaron, Hassidim, Avinatan
We consider the problem of maximizing a submodular function when given access to its approximate version. Submodular functions are heavily studied in a wide variety of disciplines, since they are used to model many real world phenomena, and are amenable to optimization. However, there are many cases in which the phenomena we observe is only approximately submodular and the approximation guarantees cease to hold. We describe a technique which we call the sampled mean approximation that yields strong guarantees for maximization of submodular functions from approximate surrogates under cardinality and intersection of matroid constraints. In particular, we show tight guarantees for maximization under a cardinality constraint and 1/(1+P) approximation under intersection of P matroids.