AITopics

Munchausen Reinforcement Learning

Neural Information Processing SystemsMay-16-2025, 03:42:06 GMT

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood - implicit Kullback-Leibler regularization and increase of the action-gap.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Appendix

Neural Information Processing SystemsMay-16-2025, 03:22:26 GMT

A.1 Conventional Test-Time Augmentation Center-Crop is the standard test-time augmentation for most of computer vision tasks [56, 29, 5, 7, 18, 26, 52]. The Center-Crop first resizes an image to a fixed size and then crops the central area to make a predefined input size. We resize an image to 256 pixels and crop the central 224 pixels for ResNet-50 in ImageNet experiment, as the same way as [18, 26, 52]. In the case of CIFAR, all images in the dataset are 32 by 32 pixels; we use the original images without any modification at the test time. Horizontal-Flip is an ensemble method using the original image and the horizontally inverted image.

artificial intelligence, machine learning, test-time augmentation, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.89)

Add feedback

Learning Loss for Test-Time Augmentation

Neural Information Processing SystemsMay-16-2025, 03:22:17 GMT

Data augmentation has been actively studied for robust neural networks. Most of the recent data augmentation methods focus on augmenting datasets during the training phase. At the testing phase, simple transformations are still widely used for test-time augmentation. This paper proposes a novel instance-level testtime augmentation that efficiently selects suitable transformations for a test input. Our proposed method involves an auxiliary module to predict the loss of each possible transformation given the input. Then, the transformations having lower predicted losses are applied to the input.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

efficient instance-aware test-time augmentation method resulting in significant gains over previous approaches

Neural Information Processing SystemsMay-16-2025, 03:22:05 GMT

We would like to thank you for your thorough evaluation, helpful suggestions, and comments. Figure 1: Comparison for the same 5 Crop Figure 2: Comparison for the same GPS transforms candidates on the clean ImageNet set using on the clean ImageNet set using ResNet-Test-time Relative Clean Corrupted set Corrupted Test-set ResNet-50. We trained our loss predictor for We trained our loss predictor on the five crop areas. Compared to the 5-crop ensemble, searched GPS policies to choose ones specific Center-Crop 1 24.14 78.93 75.42 choosing one transform by our method for each test instance. A detailed comparison will be included.

artificial intelligence, loss predictor, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

1eeaae7c89d9484926db6974b6ece564-Paper-Conference.pdf

Neural Information Processing SystemsMay-16-2025, 02:45:57 GMT

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.94)
Asia > Middle East > Israel (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Supplementary Material to " Sufficient dimension reduction for classification using principal optimal transport direction "

Neural Information Processing SystemsMay-16-2025, 02:43:35 GMT

X. Without loss of generality, we assume S(B) = S Hence, to prove Theorem 1, it is sufficient to show that S(B) = S(Σ) holds. To verify S(B) = S(Σ), we only need to show the following two results hold: (I). We now begin with the statement (I). This completes the proof for Statement I. We then turn to Statement II. This leads to a contradiction with (H.2) where the structure dimension is r.

artificial intelligence, machine learning, measure-preserving map, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Asia > China (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.41)

Add feedback

239f914f30ea3c948fce2ea07a9efb33-Paper.pdf

Neural Information Processing SystemsMay-16-2025, 02:42:58 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Appendix A Assessing Conditional Independence/Dependence in CIFAR-10H and Imagenet-16H Datasets

Neural Information Processing SystemsMay-16-2025, 02:30:40 GMT

We investigate the degree to which our conditional independence assumption is satisfied empirically in the datasets used in the paper. Specifically, of interest is the assumption of conditional independence of m(x) and h(x), given y. Assessing conditional independence is not straightforward given that m(x) is a K-dimensional real-valued vector and h(x) and y each take one of K categorical values, with K = 10 for CIFAR-10H and K = 16 for ImageNet-16H. While there exist statistical tests for assessing conditional independence for categorical random variables, with real-valued variables the situation is less straightforward and there are multiple options such as different non-parametric tests involving different tradeoffs [Runge, 2018, Marx and Vreeken, 2019, Mukherjee et al., 2020, Berrett et al., 2020]. Given these issues we investigate the degree of conditional dependence using two relatively simple approaches.

artificial intelligence, imagenet-16h, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration Gavin Kerrigan 1 Mark Steyvers Department of Computer Science

Neural Information Processing SystemsMay-16-2025, 02:30:36 GMT

An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human nor model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints.

artificial intelligence, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Country: