AITopics

Classification is one of the core problems in statistical learning and has been intensively studied in statistical and machine learning literature. Nevertheless, while the theory for binary classification is well developed (see, Devroy, Gyöfri and Lugosi, 1996; Vapnik, 2000; Boucheron, Bousquet and Lugosi, 2005 and references therein for a comprehensive review), its multiclass extensions are much less complete. Consider a general L-class classification with a (high-dimensional) vector of features X X R d and the outcome class label Y {1,..., L}. We can model it as Y (X x) Mult(p 1 (x),..., p L (x)), where p l (x) P (Y l X x), l 1,..., L. A classifier is a measurable function η: X {1,..., L}. The accuracy of a classifier η is defined by a misclassification error R(η) P (Y η(x)). The optimal classifier that minimizes this error is the Bayes classifier η (x) arg max 1 l L p l (x) with R(η) 1 E X max 1 l L p l (x). The probabilities p l (x)'s are, however, unknown and one should derive a classifier η(x) from the available data D: a random sample of n independent observations (X 1, Y 1),..., (X n, Y n) from the joint distribution of (X, Y). The corresponding (conditional) misclassification error of η is R( η) P (Y η(x) D) and the goodness of η w.r.t.

classification, classifier, misclassification excess risk, (11 more...)

2003.01951

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:

Research Report (0.68)
Overview (0.66)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Gaussianization Flows

Meng, Chenlin, Song, Yang, Song, Jiaming, Ermon, Stefano

Iterative Gaussianization is a fixed-point iteration procedure that can transform any continuous random vector into a Gaussian one. Based on iterative Gaussianization, we propose a new type of normalizing flow model that enables both efficient computation of likelihoods and efficient inversion for sample generation. We demonstrate that these models, named Gaussianization flows, are universal approximators for continuous probability distributions under some regularity conditions. Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation. Experimentally, we show that Gaussianization flows achieve better or comparable performance on several tabular datasets compared to other efficiently invertible flow models such as Real NVP, Glow and FFJORD. In particular, Gaussianization flows are easier to initialize, demonstrate better robustness with respect to different transformations of the training data, and generalize better on small training sets.

dataset, gaussianization, rotation matrix, (13 more...)

2003.01941

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Sicily > Palermo (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.31)

Jeong, Yonghyun, Choi, Hyunjin, Kim, Byoungjip, Gwon, Youngjune

DefogGAN: Predicting Hidden Information in the StarCraft Fog of War with Generative Adversarial Nets

We propose DefogGAN, a generative approach to the problem of inferring state information hidden in the fog of war for real-time strategy (RTS) games. Given a partially observed state, DefogGAN generates defogged images of a game as predictive information. Such information can lead to create a strategic agent for the game. DefogGAN is a conditional GAN variant featuring pyramidal reconstruction loss to optimize on multiple feature resolution scales. We have validated DefogGAN empirically using a large dataset of professional StarCraft replays. Our results indicate that DefogGAN can predict the enemy buildings and combat units as accurately as professional players do and achieves a superior performance among state-of-the-art defoggers. Figure 1: Comparison of DefogGAN prediction to ground truth.

defoggan, information, partial observation, (14 more...)

2003.01927

Country:

Asia (0.04)
Africa > Middle East > Djibouti > Arta > `Arta (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Transformation Importance with Applications to Cosmology

Singh, Chandan, Ha, Wooseok, Lanusse, Francois, Boehm, Vanessa, Liu, Jia, Yu, Bin

Machine learning lies at the heart of new possibilities for scientific discovery, knowledge generation, and artificial intelligence. Its potential benefits to these fields requires going beyond predictive accuracy and focusing on interpretability. In particular, many scientific problems require interpretations in a domain-specific interpretable feature space (e.g. the frequency domain) whereas attributions to the raw features (e.g. the pixel space) may be unintelligible or even misleading. To address this challenge, we propose TRIM (TRansformation IMportance), a novel approach which attributes importances to features in a transformed space and can be applied post-hoc to a fully trained model. TRIM is motivated by a cosmological parameter estimation problem using deep neural networks (DNNs) on simulated data, but it is generally applicable across domains/models and can be combined with any local interpretation method. In our cosmology example, combining TRIM with contextual decomposition shows promising results for identifying which frequencies a DNN uses, helping cosmologists to understand and validate that the model learns appropriate physical features rather than simulation artifacts.

arxiv preprint arxiv, attribution, transformation, (15 more...)

2003.01926

Country:

North America > United States (0.68)
Asia > China (0.14)
Europe > Russia (0.04)
(5 more...)

Genre: Research Report (0.70)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Wei, Chen-Yu, Luo, Haipeng, Agarwal, Alekh

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of loss predictors. The main question we address is whether one can improve over the minimax regret $\mathcal{O}(\sqrt{T})$ for learning over $T$ rounds, when the total error of the predictor $\mathcal{E} \leq T$ is relatively small. We provide a complete answer to this question, including upper and lower bounds for various settings: adversarial versus stochastic environments, known versus unknown $\mathcal{E}$, and single versus multiple predictors. We show several surprising results, such as 1) the optimal regret is $\mathcal{O}(\min\{\sqrt{T}, \sqrt{\mathcal{E}}T^\frac{1}{4}\})$ when $\mathcal{E}$ is known, a sharp contrast to the standard and better bound $\mathcal{O}(\sqrt{\mathcal{E}})$ for non-contextual problems (such as multi-armed bandits); 2) the same bound cannot be achieved if $\mathcal{E}$ is unknown, but as a remedy, $\mathcal{O}(\sqrt{\mathcal{E}}T^\frac{1}{3})$ is achievable; 3) with $M$ predictors, a linear dependence on $M$ is necessary, even if logarithmic dependence is possible for non-contextual problems. We also develop several novel algorithmic techniques to achieve matching upper bounds, including 1) a key action remapping technique for optimal regret with known $\mathcal{E}$, 2) implementing Catoni's robust mean estimator efficiently via an ERM oracle leading to an efficient algorithm in the stochastic setting with optimal regret, 3) constructing an underestimator for $\mathcal{E}$ via estimating the histogram with bins of exponentially increasing size for the stochastic setting with unknown $\mathcal{E}$, and 4) a self-referential scheme for learning with multiple predictors, all of which might be of independent interest.

algorithm, leveraging loss predictor, predictor, (10 more...)

2003.01922

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks

Fetaya, Ethan, Lifshitz, Yonatan, Aaron, Elad, Gordin, Shai

The main source of information regarding ancient Mesopotamian history and culture are clay cuneiform tablets. Despite being an invaluable resource, many tablets are fragmented leading to missing information. Currently these missing parts are manually completed by experts. In this work we investigate the possibility of assisting scholars and even automatically completing the breaks in ancient Akkadian texts from Achaemenid period Babylonia by modelling the language using recurrent neural networks.

akkadian, archive, tablet, (16 more...)

2003.01912

Country:

Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Black-box Smoothing: A Provable Defense for Pretrained Classifiers

Salman, Hadi, Sun, Mingjie, Yang, Greg, Kapoor, Ashish, Kolter, J. Zico

We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. The approach applies both to the case where we have full access to the pretrained classifier as well as the case where we only have query access. We refer to this defense as black-box smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our method to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at https://github.com/microsoft/blackbox-smoothing .

classifier, denoiser, mse, (15 more...)

2003.01908

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Washington > King County > Redmond (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Nakkiran, Preetum, Venkat, Prayaag, Kakade, Sham, Ma, Tengyu

Optimal Regularization Can Mitigate Double Descent

Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size. This striking phenomenon, often referred to as "double descent", has raised questions of if we need to re-think our current understanding of generalization. In this work, we study whether the double-descent phenomenon can be avoided by using optimal regularization. Theoretically, we prove that for certain linear regression models with isotropic data distribution, optimally-tuned $\ell_2$ regularization achieves monotonic test performance as we grow either the sample size or the model size. We also demonstrate empirically that optimally-tuned $\ell_2$ regularization can mitigate double descent for more general models, including neural networks. Our results suggest that it may also be informative to study the test risk scalings of various algorithms in the context of appropriately tuned regularization.

double descent, monotonicity, regression, (14 more...)

2003.01897

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Sudan (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

Park, Jaewoo, Jung, Yoon Gyo, Teoh, Andrew Beng Jin

Compact Surjective Encoding Autoencoder for Unsupervised Novelty Detection

In unsupervised novelty detection, a model is trained solely on the in-class data, and infer to single out out-class data. Autoencoder (AE) variants aim to compactly model the in-class data to reconstruct it exclusively, differentiating it from out-class by the reconstruction error. However, imposing compactness improperly may damage in-class reconstruction and, therefore, detection performance. To solve this, we propose Compact Surjective Encoding AE (CSE-AE). In this model, the encoding of any input is constrained into a compact manifold by exploiting the deep neural net's ignorance of the unknown. Concurrently, the in-class data is surjectively encoded to the compact manifold via AE. The mechanism is realized by both GAN and its ensembled discriminative layers, and results to reconstruct the in-class exclusively. In inference, the reconstruction error of a query is measured using high-level semantics captured by the discriminator. Extensive experiments on image data show that the proposed model gives state-of-the-art performance.

arxiv preprint arxiv, detection, novelty detection, (11 more...)

2003.01665

Country: Asia (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.93)
Information Technology (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Panaganti, Kishan, Kalathil, Dileep

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call FP-UCB algorithm, which uses the information about the underlying parameter set for faster learning. In particular, we show that the FP-UCB algorithm achieves a bounded regret under some structural condition on the underlying parameter set. We also show that, if the underlying parameter set does not satisfy the necessary structural condition, FP-UCB algorithm achieves a logarithmic regret, but with a smaller preceding constant compared to the standard UCB algorithm. We also validate the superior performance of the FP-UCB algorithm through extensive numerical simulations.

algorithm, bandit, fp-ucb algorithm, (15 more...)

2003.01328

Country: North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report (0.50)

Industry: Media (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)