AITopics | Maximum Entropy

Collaborating Authors

Maximum Entropy

News Overviews Instructional Materials AI-Alerts Classics

Maximum Entropy Reinforcement Learning with Diffusion Policy

Dong, Xiaoyi, Cheng, Jian, Zhang, Xi Sheryl

arXiv.org Artificial IntelligenceFeb-18-2025

The Soft Actor-Critic (SAC) algorithm with a Gaussian policy has become a mainstream implementation for realizing the Maximum Entropy Reinforcement Learning (MaxEnt RL) objective, which incorporates entropy maximization to encourage exploration and enhance policy robustness. While the Gaussian policy performs well on simpler tasks, its exploration capacity and potential performance in complex multi-goal RL environments are limited by its inherent unimodality. In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective, developing a method named MaxEnt RL with Diffusion Policy (MaxEntDP). Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy. Experimental results on Mujoco benchmarks show that MaxEntDP outperforms the Gaussian policy and other generative models within the MaxEnt RL framework, and performs comparably to other state-of-the-art diffusion-based online RL algorithms. Our code is available at https://github.com/diffusionyes/MaxEntDP.

machine learning, q-function, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2502.11612

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)

Add feedback

Appendix for " Learning Neural Set Functions Under the Optimal Subset Oracle " 15 B Derivations 16 B.1 Derivations of the Maximum Entropy Distribution

Neural Information Processing SystemsFeb-10-2025, 17:34:22 GMT

The probabilistic greedy model (PGM) solves optimization (1) with a differentiable extension of greedy maximization algorithm (Tschiatschek et al., 2018). To alleviate this problem, Tschiatschek et al. (2018) finally construct the set mass function by enumerating all possible permutations p However, maximizing the log likelihood of (14) is prohibitively expensive and unscalable due to the exponential time complexity of enumerating all permutations. Although one can apply Monte Carlo approximation to avoid that, i.e., approximating log p B.1 Derivations of the Maximum Entropy Distribution The first step to solve problem (2) is to construct a proper set mass function p Here, one would care about what the most appropriate set mass function should be? Generally we prefer the model to assume nothing about what is unknown. More formally, we should choose the most "uniform" distribution, which maximizes the Shannon entropy H(p) = This principle is known as "noninformative prior" (Jeffreys, 1946), which has been widely applied in many physical systems (Jaynes, 1957a,b). It turns out that the energy-based model is the only distribution with maximum entropy. More specifically, the following theorem holds: Theorem 1.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.81)

Add feedback

DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

Celik, Onur, Li, Zechu, Blessing, Denis, Li, Ge, Palanicek, Daniel, Peters, Jan, Chalvatzaki, Georgia, Neumann, Gerhard

arXiv.org Artificial IntelligenceFeb-4-2025

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges--primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2502.02316

Country:

North America > United States (0.29)
Europe > Germany (0.28)

Genre: Research Report (0.40)

Industry:

Energy > Oil & Gas > Upstream (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning

Boucher, Rémy Hosseinkhan, Semeraro, Onofrio, Mathelin, Lionel

arXiv.org Artificial IntelligenceJan-28-2025

The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.

artificial intelligence, machine learning, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2501.17115

Country:

Europe > France (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.65)

Add feedback

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness: Supplementary Material Long Zhao 1 Ting Liu 2 Xi Peng

Neural Information Processing SystemsJan-27-2025, 07:52:01 GMT

To bound the deviation of the entropy estimates, we use McDiarmid's inequality [13], in a manner similar to [1]. For this, we must bound the change in value of each of the entropy estimations when a single instance in S is arbitrarily changed. A useful and easily proven inequality in that regard is the following: for any natural m and for any a [0, 1 1/m] and 1/m, |(a +) log(a +) a log(a)| log(m) m. (1) With this in equality, a careful application of McDiarmid's inequality leads to the following lemma. For any δ (0, 1), with probability of at least 1 δ over the sample set, we have that, |Ĥ(T) E[Ĥ(T)]| |T | log(m) log(2/δ) . First, we bound the change caused by a single replacement in Ĥ(T).

artificial intelligence, international conference, machine learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)

Add feedback

Review for NeurIPS paper: Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

Neural Information Processing SystemsJan-27-2025, 07:51:55 GMT

Weaknesses: It is not clear what are the main technical contributions of the paper. The paper oversells it theoretical results and the motivation for the proposed regularizer is weak. The paper misrepresents its contributions in terms of cosmetic theorems and lemma. See the points (i) - (iv) below. In the Appendix Line 70, it is written that After extending it to the case when Y is a deterministic function of X, we get the bound in Theorem 3''.

improved generalization and robustness, maximum-entropy adversarial data augmentation, neurips paper, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)

Add feedback

Review for NeurIPS paper: Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

Neural Information Processing SystemsJan-27-2025, 07:51:49 GMT

The paper was extensively discussed among the reviewers. The final outcome was that all the reviewers agreed that the theoretical part of the paper is not significantly novel and the authors have to rewrite that part (please see the updated reviews), however, the approach is novel and experimental part is strong. To evaluate the experimental part further, a new reviewer was added after the rebuttal who has a good understanding on the experimental side of the topic of adversarial data augmentation. The new reviewer confirmed that the usefulness of the entropy-based regularization term toward providing robustness against unseen shifts is significant.

improved generalization and robustness, maximum-entropy adversarial data augmentation, reviewer, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)

Add feedback

A Proof of Theorem 1 Recall that under maximum entropy RL, the Q-function is defined as Q π ent, a

Neural Information Processing SystemsJan-27-2025, 03:52:32 GMT

We use uncorrected to denote prioritized sampling without IS corrections.

artificial intelligence, ent, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)

Add feedback

Review for NeurIPS paper: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Neural Information Processing SystemsJan-26-2025, 15:04:13 GMT

Correctness: The main technical content seems to be correct. I have the following questions though: When using the linear assumption for the reward and the dynamics, the feature selection/setting is crutial. To relax the linear assumption, it is also mentioned, features can be pre-trained. What would be the recommended way to pre-learn it? For possible violation of the assumptions, how it would affect the results in practice?

average-reward mdp, maximum-entropy approach, off-policy evaluation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)

Add feedback

A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Neural Information Processing SystemsJan-26-2025, 15:04:13 GMT

This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e.

artificial intelligence, function approximation, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.42)

Add feedback