Country
Multiclass classification by sparse multinomial logistic regression
Abramovich, Felix, Grinshtein, Vadim, Levy, Tomer
Classification is one of the core problems in statistical learning and has been intensively studied in statistical and machine learning literature. Nevertheless, while the theory for binary classification is well developed (see, Devroy, Gyöfri and Lugosi, 1996; Vapnik, 2000; Boucheron, Bousquet and Lugosi, 2005 and references therein for a comprehensive review), its multiclass extensions are much less complete. Consider a general L-class classification with a (high-dimensional) vector of features X X R d and the outcome class label Y {1,..., L}. We can model it as Y (X x) Mult(p 1 (x),..., p L (x)), where p l (x) P (Y l X x), l 1,..., L. A classifier is a measurable function η: X {1,..., L}. The accuracy of a classifier η is defined by a misclassification error R(η) P (Y η(x)). The optimal classifier that minimizes this error is the Bayes classifier η (x) arg max 1 l L p l (x) with R(η) 1 E X max 1 l L p l (x). The probabilities p l (x)'s are, however, unknown and one should derive a classifier η(x) from the available data D: a random sample of n independent observations (X 1, Y 1),..., (X n, Y n) from the joint distribution of (X, Y). The corresponding (conditional) misclassification error of η is R( η) P (Y η(x) D) and the goodness of η w.r.t.
Gaussianization Flows
Meng, Chenlin, Song, Yang, Song, Jiaming, Ermon, Stefano
Iterative Gaussianization is a fixed-point iteration procedure that can transform any continuous random vector into a Gaussian one. Based on iterative Gaussianization, we propose a new type of normalizing flow model that enables both efficient computation of likelihoods and efficient inversion for sample generation. We demonstrate that these models, named Gaussianization flows, are universal approximators for continuous probability distributions under some regularity conditions. Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation. Experimentally, we show that Gaussianization flows achieve better or comparable performance on several tabular datasets compared to other efficiently invertible flow models such as Real NVP, Glow and FFJORD. In particular, Gaussianization flows are easier to initialize, demonstrate better robustness with respect to different transformations of the training data, and generalize better on small training sets.
DefogGAN: Predicting Hidden Information in the StarCraft Fog of War with Generative Adversarial Nets
Jeong, Yonghyun, Choi, Hyunjin, Kim, Byoungjip, Gwon, Youngjune
We propose DefogGAN, a generative approach to the problem of inferring state information hidden in the fog of war for real-time strategy (RTS) games. Given a partially observed state, DefogGAN generates defogged images of a game as predictive information. Such information can lead to create a strategic agent for the game. DefogGAN is a conditional GAN variant featuring pyramidal reconstruction loss to optimize on multiple feature resolution scales. We have validated DefogGAN empirically using a large dataset of professional StarCraft replays. Our results indicate that DefogGAN can predict the enemy buildings and combat units as accurately as professional players do and achieves a superior performance among state-of-the-art defoggers. Figure 1: Comparison of DefogGAN prediction to ground truth.
Transformation Importance with Applications to Cosmology
Singh, Chandan, Ha, Wooseok, Lanusse, Francois, Boehm, Vanessa, Liu, Jia, Yu, Bin
Machine learning lies at the heart of new possibilities for scientific discovery, knowledge generation, and artificial intelligence. Its potential benefits to these fields requires going beyond predictive accuracy and focusing on interpretability. In particular, many scientific problems require interpretations in a domain-specific interpretable feature space (e.g. the frequency domain) whereas attributions to the raw features (e.g. the pixel space) may be unintelligible or even misleading. To address this challenge, we propose TRIM (TRansformation IMportance), a novel approach which attributes importances to features in a transformed space and can be applied post-hoc to a fully trained model. TRIM is motivated by a cosmological parameter estimation problem using deep neural networks (DNNs) on simulated data, but it is generally applicable across domains/models and can be combined with any local interpretation method. In our cosmology example, combining TRIM with contextual decomposition shows promising results for identifying which frequencies a DNN uses, helping cosmologists to understand and validate that the model learns appropriate physical features rather than simulation artifacts.
Taking a hint: How to leverage loss predictors in contextual bandits?
Wei, Chen-Yu, Luo, Haipeng, Agarwal, Alekh
We initiate the study of learning in contextual bandits with the help of loss predictors. The main question we address is whether one can improve over the minimax regret $\mathcal{O}(\sqrt{T})$ for learning over $T$ rounds, when the total error of the predictor $\mathcal{E} \leq T$ is relatively small. We provide a complete answer to this question, including upper and lower bounds for various settings: adversarial versus stochastic environments, known versus unknown $\mathcal{E}$, and single versus multiple predictors. We show several surprising results, such as 1) the optimal regret is $\mathcal{O}(\min\{\sqrt{T}, \sqrt{\mathcal{E}}T^\frac{1}{4}\})$ when $\mathcal{E}$ is known, a sharp contrast to the standard and better bound $\mathcal{O}(\sqrt{\mathcal{E}})$ for non-contextual problems (such as multi-armed bandits); 2) the same bound cannot be achieved if $\mathcal{E}$ is unknown, but as a remedy, $\mathcal{O}(\sqrt{\mathcal{E}}T^\frac{1}{3})$ is achievable; 3) with $M$ predictors, a linear dependence on $M$ is necessary, even if logarithmic dependence is possible for non-contextual problems. We also develop several novel algorithmic techniques to achieve matching upper bounds, including 1) a key action remapping technique for optimal regret with known $\mathcal{E}$, 2) implementing Catoni's robust mean estimator efficiently via an ERM oracle leading to an efficient algorithm in the stochastic setting with optimal regret, 3) constructing an underestimator for $\mathcal{E}$ via estimating the histogram with bins of exponentially increasing size for the stochastic setting with unknown $\mathcal{E}$, and 4) a self-referential scheme for learning with multiple predictors, all of which might be of independent interest.
Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks
Fetaya, Ethan, Lifshitz, Yonatan, Aaron, Elad, Gordin, Shai
The main source of information regarding ancient Mesopotamian history and culture are clay cuneiform tablets. Despite being an invaluable resource, many tablets are fragmented leading to missing information. Currently these missing parts are manually completed by experts. In this work we investigate the possibility of assisting scholars and even automatically completing the breaks in ancient Akkadian texts from Achaemenid period Babylonia by modelling the language using recurrent neural networks.
Black-box Smoothing: A Provable Defense for Pretrained Classifiers
Salman, Hadi, Sun, Mingjie, Yang, Greg, Kapoor, Ashish, Kolter, J. Zico
We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. The approach applies both to the case where we have full access to the pretrained classifier as well as the case where we only have query access. We refer to this defense as black-box smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our method to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at https://github.com/microsoft/blackbox-smoothing .
Optimal Regularization Can Mitigate Double Descent
Nakkiran, Preetum, Venkat, Prayaag, Kakade, Sham, Ma, Tengyu
Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size. This striking phenomenon, often referred to as "double descent", has raised questions of if we need to re-think our current understanding of generalization. In this work, we study whether the double-descent phenomenon can be avoided by using optimal regularization. Theoretically, we prove that for certain linear regression models with isotropic data distribution, optimally-tuned $\ell_2$ regularization achieves monotonic test performance as we grow either the sample size or the model size. We also demonstrate empirically that optimally-tuned $\ell_2$ regularization can mitigate double descent for more general models, including neural networks. Our results suggest that it may also be informative to study the test risk scalings of various algorithms in the context of appropriately tuned regularization.
Compact Surjective Encoding Autoencoder for Unsupervised Novelty Detection
Park, Jaewoo, Jung, Yoon Gyo, Teoh, Andrew Beng Jin
In unsupervised novelty detection, a model is trained solely on the in-class data, and infer to single out out-class data. Autoencoder (AE) variants aim to compactly model the in-class data to reconstruct it exclusively, differentiating it from out-class by the reconstruction error. However, imposing compactness improperly may damage in-class reconstruction and, therefore, detection performance. To solve this, we propose Compact Surjective Encoding AE (CSE-AE). In this model, the encoding of any input is constrained into a compact manifold by exploiting the deep neural net's ignorance of the unknown. Concurrently, the in-class data is surjectively encoded to the compact manifold via AE. The mechanism is realized by both GAN and its ensembled discriminative layers, and results to reconstruct the in-class exclusively. In inference, the reconstruction error of a query is measured using high-level semantics captured by the discriminator. Extensive experiments on image data show that the proposed model gives state-of-the-art performance.
Bounded Regret for Finitely Parameterized Multi-Armed Bandits
Panaganti, Kishan, Kalathil, Dileep
We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call FP-UCB algorithm, which uses the information about the underlying parameter set for faster learning. In particular, we show that the FP-UCB algorithm achieves a bounded regret under some structural condition on the underlying parameter set. We also show that, if the underlying parameter set does not satisfy the necessary structural condition, FP-UCB algorithm achieves a logarithmic regret, but with a smaller preceding constant compared to the standard UCB algorithm. We also validate the superior performance of the FP-UCB algorithm through extensive numerical simulations.