Maximum Entropy
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Boucher, Rémy Hosseinkhan, Semeraro, Onofrio, Mathelin, Lionel
The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.65)
Review for NeurIPS paper: Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
Weaknesses: It is not clear what are the main technical contributions of the paper. The paper oversells it theoretical results and the motivation for the proposed regularizer is weak. The paper misrepresents its contributions in terms of cosmetic theorems and lemma. See the points (i) - (iv) below. In the Appendix Line 70, it is written that After extending it to the case when Y is a deterministic function of X, we get the bound in Theorem 3''.
Review for NeurIPS paper: Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
The paper was extensively discussed among the reviewers. The final outcome was that all the reviewers agreed that the theoretical part of the paper is not significantly novel and the authors have to rewrite that part (please see the updated reviews), however, the approach is novel and experimental part is strong. To evaluate the experimental part further, a new reviewer was added after the rebuttal who has a good understanding on the experimental side of the topic of adversarial data augmentation. The new reviewer confirmed that the usefulness of the entropy-based regularization term toward providing robustness against unseen shifts is significant.
Review for NeurIPS paper: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
Correctness: The main technical content seems to be correct. I have the following questions though: When using the linear assumption for the reward and the dynamics, the feature selection/setting is crutial. To relax the linear assumption, it is also mentioned, features can be pre-trained. What would be the recommended way to pre-learn it? For possible violation of the assumptions, how it would affect the results in practice?
Review for NeurIPS paper: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
This is a borderline paper. The paper is technically sound and addressing OPE in average-reward setting is an important problem. Despite that the work is an extension of Duan and Wang (for discounted setting) to the average-reward setting, the algorithm is somewhat different, as Duan and Wang uses FQE whereas the current paper performs stationary-distribution estimation. That said, there are a few weaknesses that the paper should try to address or at least discuss: 1. The entropy maximization is a novel algorithmic element which does not appear in previous approaches in the discounted setting.
Reviews: Maximum Entropy Monte-Carlo Planning
This paper proposes a new MCTS algorithm, Maximum Entropy for Tree Search (MENTS), which combines the maximum entropy policy optimization framework with MCTS for more efficient online planning in sequential decision problems. The main idea is to replace the Monte Carlo value estimate with the softmax value estimate as in the maximum entropy policy optimization framework, such that the state value can be estimated and back-propagated more efficiently in the search tree. Another main novelty is that it proposes an optimal algorithm, Empirical Exponential Weight (E2W), to be the tree policy to do more exploration. It shows that MENTS can achieve an exponential convergence rate towards finding the optimal action at the root of the tree, which is much faster than the polynomial convergence rate of the UCT method. The experimental results also demonstrate that MENTS performs significantly better than UCT in terms of sample efficiency, in both synthetic problems and Atari games.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.88)
Reviews: Maximum Entropy Monte-Carlo Planning
This paper presents an appealing idea to combine current max-entropy methods in RL with Monte-Carlo Tree Search. A theoretical result shows improved rate of convergence, while empirical results show improved sample efficiency. The initial reviews were quite positive; I only noted a small number of issues mentioned in the reviews of R1 and R3. In our discussions after reading the author feedback, R3 noted that some of his concerns have not been addressed. R2 replied, saying that these concerns are relatively minor and can be addressed in the final version.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.79)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)
Self-Supervised Learning via Maximum Entropy Coding
A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably introduce biases into the learned representation, which in turn leads to biased transfer performance on various downstream tasks. To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks. Inspired by the principle of maximum entropy in information theory, we hypothesize that a generalizable representation should be the one that admits the maximum entropy among all plausible representations. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective that allows fast computation.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.97)
Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems
King-Roskamp, Matthew, Choksi, Rustum, Hoheisel, Tim
We establish the theoretical framework for implementing the maximumn entropy on the mean (MEM) method for linear inverse problems in the setting of approximate (data-driven) priors. We prove a.s. convergence for empirical means and further develop general estimates for the difference between the MEM solutions with different priors $\mu$ and $\nu$ based upon the epigraphical distance between their respective log-moment generating functions. These estimates allow us to establish a rate of convergence in expectation for empirical means. We illustrate our results with denoising on MNIST and Fashion-MNIST data sets.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
Bolland, Adrien, Lambrechts, Gaspard, Ernst, Damien
We introduce a new maximum entropy reinforcement learning framework based on the distribution of states and actions visited by a policy. More precisely, an intrinsic reward function is added to the reward function of the Markov decision process that shall be controlled. For each state and action, this intrinsic reward is the relative entropy of the discounted distribution of states and actions (or features from these states and actions) visited during the next time steps. We first prove that an optimal exploration policy, which maximizes the expected discounted sum of intrinsic rewards, is also a policy that maximizes a lower bound on the state-action value function of the decision process under some assumptions. We also prove that the visitation distribution used in the intrinsic reward definition is the fixed point of a contraction operator. Following, we describe how to adapt existing algorithms to learn this fixed point and compute the intrinsic rewards to enhance exploration. A new practical off-policy maximum entropy reinforcement learning algorithm is finally introduced. Empirically, exploration policies have good state-action space coverage, and high-performing control policies are computed efficiently.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Europe > Belgium > Wallonia > Liège Province > Liège (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)