adversary
Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning
Dann, Christoph, Mansour, Yishay, Mohri, Mehryar
Model-based reinforcement learning (MBRL) agents typically learn world models by minimizing predictive loss. However, powerful RL optimizers inevitably exploit minor model inaccuracies, leading to simulator exploitation and a reality gap where policies succeed in simulation but fail in the real world. We propose that the objective for learning simulators should be strategic robustness rather than predictive accuracy, and formulate this as a zero-sum minimax game between a model player and an adversarial policy player. We provide a comprehensive theoretical analysis: (1) an online learning guarantee showing the game is learnable with sublinear regret bounds; (2) a tractable critic-based simplification bounding the global policy-value gap by the local critic's loss; and (3) an Error-MDP duality, proving that finding the worst-case policy is formally dual to a standard RL problem where the reward is the one-step critic error. This duality yields a provably convergent active data selection algorithm. Experiments on continuous control tasks demonstrate that our approach reduces prediction error in strategically important regions by $1.5$-$2.2\times$ and enables policies trained purely in simulation to match near-optimal real-world performance.
On the Sample Complexity of Robust Binary Hypothesis Testing
Vallinayagam, Shankar, Pensia, Ankit, Jog, Varun
We study the sample complexity of robust binary hypothesis testing under three standard contamination models: $\varepsilon$-additive (Huber), $\varepsilon$-subtractive, and $\varepsilon$-total variation (TV), denoted by $n^*_{\mathrm{Hub}}(\varepsilon)$, $n^*_{\mathrm{Sub}}(\varepsilon)$, and $n^*_{\mathrm{TV}}(\varepsilon)$, respectively. For subtractive contamination, we show that least favourable distributions exist and provide explicit formulas for the same, bringing this model in line with the classical Huber and TV models. Next we show that in all three models, sample complexity may be highly unstable in the contamination parameter $\varepsilon$, increasing by polynomial factors even for $o(\varepsilon)$ perturbations. Similarly, there may be polynomial factor gaps between the sample complexities when $\varepsilon$ is known exactly versus when it is known up to $o(\varepsilon)$ error. Despite the instability of the sample complexity in all models, we show that the sample complexities across models are comparable up to constant-factor rescaling of $\varepsilon$. Specifically, for any fixed $ฮด_0>0$, the following hold for all distributions $p$ and $q$: (i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$, (ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+ฮด_0)\varepsilon)$, and (iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+ฮด_0)\varepsilon)$, and the scaling constants are tight. Finally, we extend our results to adaptive versions of the contamination models.
Robust Statistical Estimators with Bounded Empirical Sensitivity
Iverson, Valentio, Kamath, Gautam, Mouzakis, Argyris, Smith, Adam
We introduce a new measure of robustness for statistical estimators, which we call \emph{empirical sensitivity}. An estimator $\hat ฮธ$ has bounded empirical sensitivity if, with high probability over a dataset $X = (X_1, \dots, X_n) \sim \mathcal{D}^{\otimes n}$, for any dataset $Y$ obtained by modifying at most $ฮทn$ points in $X$, we have that $\hat ฮธ(Y)$ is close to $\hat ฮธ(X)$. We study bounds on this quantity for the prototypical problem of Gaussian mean estimation. We prove new lower bounds, showing that for any estimator $\hat ฮผ$ which achieves an optimal $\ell_2$-error bound of $O\left(\sqrt{d/n}\right)$, the empirical sensitivity is at least $ฮฉ\left(ฮท+ \sqrt{ฮทd/n}\right)$. The two terms arise due to obstructions on the mean and variance (via an Efron-Stein argument) of such an estimator. We show that this bound is tight up to logarithmic factors, by employing recent results for robust empirical mean estimation.
f8928b073ccbec15d35f2a9d39430bfd-Supplemental-Conference.pdf
Our experiments in Section 3 and Section 4 were conducted with an adversary who has side informa-684 tion about the target point. Here, we reduce the amount of background knowledge the adversary has685 about the target, and measure how this affects the reconstruction upper bound and attack success.686 We do this in the following set-up: Given a target z, we initialize our reconstruction from uniform687 noise and optimize with the gradient-based reconstruction attack introduced in Section 2 to produce688 หz.
Lower Bounds on Adversarial Robustness from Optimal Transport
Arjun Nitin Bhagoji, Daniel Cullina, Prateek Mittal
While progress has been made in understanding the robustness of machine learning classifiers to test-time adversaries (evasion attacks), fundamental questions remain unresolved. In this paper, we use optimal transport to characterize the minimum possible loss in an adversarial classification scenario. In this setting, an adversary receives a random labeled example from one of two classes, perturbs the example subject to a neighborhood constraint, and presents the modified example to the classifier. We define an appropriate cost function such that the minimum transportation cost between the distributions of the two classes determines the minimum 0 1 loss for any classifier. When the classifier comes from a restricted hypothesis class, the optimal transportation cost provides a lower bound. We apply our framework to the case of Gaussian data with norm-bounded adversaries and explicitly show matching bounds for the classification and transport problems as well as the optimality of linear classifiers. We also characterize the sample complexity of learning in this setting, deriving and extending previously known results as a special case. Finally, we use our framework to study the gap between the optimal classification performance possible and that currently achieved by state-of-the-art robustly trained neural networks for datasets of interest, namely, MNIST, Fashion MNIST and CIFAR-10.
02bf86214e264535e3412283e817deaa-AuthorFeedback.pdf
We thank the reviewers for their insightful feedback, and we appreciate the opportunity to improve our paper. We will1 address typos and notational inconsistencies in the updated version.2 Response to Reviewer 1:3 We would like to emphasize that Theorem 1 is the most important contribution of our paper due to its generality.4 By considering the set of all possible classifiers, it provides lower bounds on adversarial robustness for any pair of5 class-conditional distributions. As we show in our experimental results in Section 6, we are able to obtain lower bounds6 for arbitrary real-world datasets by constructing the empirical distribution for these. In our estimation, these results7 serve to provide theoretical validation for adversarial training for low perturbation budgets as well as to highlight the8 gap to optimality for higher budgets.9