Well File:

Munchausen Reinforcement Learning

Neural Information Processing Systems

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood - implicit Kullback-Leibler regularization and increase of the action-gap.


A Appendix

Neural Information Processing Systems

A.1 Conventional Test-Time Augmentation Center-Crop is the standard test-time augmentation for most of computer vision tasks [56, 29, 5, 7, 18, 26, 52]. The Center-Crop first resizes an image to a fixed size and then crops the central area to make a predefined input size. We resize an image to 256 pixels and crop the central 224 pixels for ResNet-50 in ImageNet experiment, as the same way as [18, 26, 52]. In the case of CIFAR, all images in the dataset are 32 by 32 pixels; we use the original images without any modification at the test time. Horizontal-Flip is an ensemble method using the original image and the horizontally inverted image.


Learning Loss for Test-Time Augmentation

Neural Information Processing Systems

Data augmentation has been actively studied for robust neural networks. Most of the recent data augmentation methods focus on augmenting datasets during the training phase. At the testing phase, simple transformations are still widely used for test-time augmentation. This paper proposes a novel instance-level testtime augmentation that efficiently selects suitable transformations for a test input. Our proposed method involves an auxiliary module to predict the loss of each possible transformation given the input. Then, the transformations having lower predicted losses are applied to the input.


efficient instance-aware test-time augmentation method resulting in significant gains over previous approaches

Neural Information Processing Systems

We would like to thank you for your thorough evaluation, helpful suggestions, and comments. Figure 1: Comparison for the same 5 Crop Figure 2: Comparison for the same GPS transforms candidates on the clean ImageNet set using on the clean ImageNet set using ResNet-Test-time Relative Clean Corrupted set Corrupted Test-set ResNet-50. We trained our loss predictor for We trained our loss predictor on the five crop areas. Compared to the 5-crop ensemble, searched GPS policies to choose ones specific Center-Crop 1 24.14 78.93 75.42 choosing one transform by our method for each test instance. A detailed comparison will be included.



Supplementary Material to " Sufficient dimension reduction for classification using principal optimal transport direction "

Neural Information Processing Systems

X. Without loss of generality, we assume S(B) = S Hence, to prove Theorem 1, it is sufficient to show that S(B) = S(ฮฃ) holds. To verify S(B) = S(ฮฃ), we only need to show the following two results hold: (I). We now begin with the statement (I). This completes the proof for Statement I. We then turn to Statement II. This leads to a contradiction with (H.2) where the structure dimension is r.



Appendix A Assessing Conditional Independence/Dependence in CIFAR-10H and Imagenet-16H Datasets

Neural Information Processing Systems

We investigate the degree to which our conditional independence assumption is satisfied empirically in the datasets used in the paper. Specifically, of interest is the assumption of conditional independence of m(x) and h(x), given y. Assessing conditional independence is not straightforward given that m(x) is a K-dimensional real-valued vector and h(x) and y each take one of K categorical values, with K = 10 for CIFAR-10H and K = 16 for ImageNet-16H. While there exist statistical tests for assessing conditional independence for categorical random variables, with real-valued variables the situation is less straightforward and there are multiple options such as different non-parametric tests involving different tradeoffs [Runge, 2018, Marx and Vreeken, 2019, Mukherjee et al., 2020, Berrett et al., 2020]. Given these issues we investigate the degree of conditional dependence using two relatively simple approaches.


Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration Gavin Kerrigan 1 Mark Steyvers Department of Computer Science

Neural Information Processing Systems

An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human nor model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints.