Fanny Yang
Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness
Fanny Yang, Zuowen Wang, Christina Heinze-Deml
This work provides theoretical and empirical evidence that invariance-inducing regularizers can increase predictive accuracy for worst-case spatial transformations (spatial robustness). Evaluated on these adversarially transformed examples, we demonstrate that adding regularization on top of standard augmented or adversarial training reduces the relative robust error on CIFAR-10 by 20% with minimal computational overhead. Similar relative gains hold for SVHN and CIFAR-100. Regularized augmentation-based methods in fact even outperform handcrafted networks that were explicitly designed to be spatial-equivariant. Furthermore, we observe for SVHN, known to have inherent variance in orientation, that robust training also improves standard accuracy on the test set. We prove that this notrade-off phenomenon holds for adversarial examples from transformation groups in the infinite data limit.
A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control
Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, Martin J. Wainwright
We propose an alternative framework to existing setups for controlling false alarms when multiple A/B tests are run over time. This setup arises in many practical applications, e.g. when pharmaceutical companies test new treatment options against control pills for different diseases, or when internet companies test their default webpages versus various alternatives over time. Our framework proposes to replace a sequence of A/B tests by a sequence of best-arm MAB instances, which can be continuously monitored by the data scientist. When interleaving the MAB tests with an online false discovery rate (FDR) algorithm, we can obtain the best of both worlds: low sample complexity and any time online FDR control. Our main contributions are: (i) to propose reasonable definitions of a null hypothesis for MAB instances; (ii) to demonstrate how one can derive an always-valid sequential p-value that allows continuous monitoring of each MAB test; and (iii) to show that using rejection thresholds of online-FDR algorithms as the confidence levels for the MAB algorithms results in both sample-optimality, high power and low FDR at any point in time. We run extensive simulations to verify our claims, and also report results on real data collected from the New Yorker Cartoon Caption contest.
Early stopping for kernel boosting algorithms: A general analysis with localized complexities
Yuting Wei, Fanny Yang, Martin J. Wainwright
Early stopping of iterative algorithms is a widely-used form of regularization in statistics, commonly used in conjunction with boosting and related gradienttype algorithms. Although consistency results have been established in some settings, such estimators are less well-understood than their analogues based on penalized regularization.
Online control of the false discovery rate with decaying memory
Aaditya Ramdas, Fanny Yang, Martin J. Wainwright, Michael I. Jordan
In the online multiple testing problem, p-values corresponding to different null hypotheses are observed one by one, and the decision of whether or not to reject the current hypothesis must be made immediately, after which the next p-value is observed. Alpha-investing algorithms to control the false discovery rate (FDR), formulated by Foster and Stine, have been generalized and applied to many settings, including quality-preserving databases in science and multiple A/B or multi-armed bandit tests for internet commerce. This paper improves the class of generalized alpha-investing algorithms (GAI) in four ways: (a) we show how to uniformly improve the power of the entire class of monotone GAI procedures by awarding more alpha-wealth for each rejection, giving a win-win resolution to a recent dilemma raised by Javanmard and Montanari, (b) we demonstrate how to incorporate prior weights to indicate domain knowledge of which hypotheses are likely to be non-null, (c) we allow for differing penalties for false discoveries to indicate that some hypotheses may be more important than others, (d) we define a new quantity called the decaying memory false discovery rate (mem-FDR) that may be more meaningful for truly temporal applications, and which alleviates problems that we describe and refer to as "piggybacking" and "alpha-death." Our GAI++ algorithms incorporate all four generalizations simultaneously, and reduce to more powerful variants of earlier algorithms when the weights and decay are all set to unity. Finally, we also describe a simple method to derive new online FDR rules based on an estimated false discovery proportion.