AITopics | maximum entropy reinforcement learning

Collaborating Authors

maximum entropy reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning

Sanokowski, Sebastian, Patil, Kaustubh, Knoll, Alois

arXiv.org Machine LearningDec-4-2025

Diffusion models have achieved remarkable success in data-driven learning and in sampling from complex, unnormalized target distributions. Building on this progress, we reinterpret Maximum Entropy Reinforcement Learning (MaxEntRL) as a diffusion model-based sampling problem. We tackle this problem by minimizing the reverse Kullback-Leibler (KL) divergence between the diffusion policy and the optimal policy distribution using a tractable upper bound. By applying the policy gradient theorem to this objective, we derive a modified surrogate objective for MaxEntRL that incorporates diffusion dynamics in a principled way. This leads to simple diffusion-based variants of Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO) and Wasserstein Policy Optimization (WPO), termed DiffSAC, DiffPPO and DiffWPO. All of these methods require only minor implementation changes to their base algorithm. We find that on standard continuous control benchmarks, DiffSAC, DiffPPO and DiffWPO achieve better returns and higher sample efficiency than SAC and PPO.

diffusion model framework, maximum entropy reinforcement learning, objective, (10 more...)

arXiv.org Machine Learning

2512.02019

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.64)

Add feedback

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Neural Information Processing SystemsMay-27-2025, 03:47:07 GMT

Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function. In this paper, we introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow). Our method enables the calculation of the soft value function used in the policy evaluation target without Monte Carlo approximation.

energy-based normalizing flow, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks

Choe, Jean Seong Bjorn, Choi, Bumkyu, Kim, Jong-kook

arXiv.org Artificial IntelligenceMay-13-2025

-- This report presents our reinforcement learning-based approach for the swing-up and stabilisation tasks of the acrobot and pendubot, tailored specifcially to the updated guidelines of the 3rd AI Olympics at ICRA 2025. Building upon our previously developed A verage-Reward Entropy Advantage Policy Optimization (AR-EAPO) algorithm, we refined our solution to effectively address the new competition scenarios and evaluation metrics. Extensive simulations validate that our controller robustly manages these revised tasks, demonstrating adaptability and effectiveness within the updated framework. Building upon prior competitions at IJCAI 2023 [3] and IROS 2024 [4], the current edition places particular emphasis on global policy robustness, requiring solutions for reliable swing-up stabilisation tasks from arbitrary initial configurations under significantly increased external disturbances. The competition maintains its use of two different configurations: the acrobot, characterised by an inactive shoulder joint, and the pendubot, with an inactive elbow joint.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.07516

Country: Asia > South Korea (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.42)

Add feedback

Agent Teaming in Mixed-Motive Situations – an AAAI Fall symposium

AIHubJan-8-2024, 14:15:11 GMT

Professor Subbarao Khambhampati's (Arizona State University) keynote discussed the dual nature of mental modeling in cooperation and competition. The importance of obfuscatory behavior, controlled observability planning, and the use of explanations for model reconciliation was emphasized, particularly regarding trust-building in human-robot interactions. Professor Gita Sukthankar's (University of Central Florida) talk focused on challenges in teamwork, using a case study on software engineering teams. Innovative techniques for distinguishing effective teams from ineffective ones were explored, setting the stage for discussions on the complexities of mixed-motive scenarios. Dr Marc Steinberg (Office of Naval Research) moderated an interactive discussion exploring research challenges in mixed-motive teams, including modeling humans, experimental setups, and measuring and assessing mixed-motive situations.

artificial intelligence, collaboration, scenario, (14 more...)

AIHub

Country: North America > United States > Arizona (0.27)

Technology: Information Technology > Artificial Intelligence > Robots (0.87)

Add feedback

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

Hu, Dailin, Abbeel, Pieter, Fox, Roy

arXiv.org Artificial IntelligenceNov-28-2021

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness. Most MaxEnt RL methods, however, use a constant tradeoff coefficient (temperature), contrary to the intuition that the temperature should be high early in training to avoid overfitting to noisy value estimates and decrease later in training as we increasingly trust high value estimates to truly lead to good rewards. Moreover, our confidence in value estimates is state-dependent, increasing every time we use more evidence to update an estimate. In this paper, we present a simple state-based temperature scheduling approach, and instantiate it for SQL as Count-Based Soft Q-Learning (CBSQL). We evaluate our approach on a toy domain as well as in several Atari 2600 domains and show promising results.

algorithm, arxiv preprint arxiv, workshop paper, (10 more...)

arXiv.org Artificial Intelligence

2111.14204

Country:

North America > United States > California > Orange County > Irvine (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)

Add feedback

Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning

Arriojas, Argenis, Tiomkin, Stas, Kulkarni, Rahul V.

arXiv.org Machine LearningJun-7-2021

We introduce a mapping between Maximum Entropy Reinforcement Learning (MaxEnt RL) and Markovian processes conditioned on rare events. In the long time limit, this mapping allows us to derive analytical expressions for the optimal policy, dynamics and initial state distributions for the general case of stochastic dynamics in MaxEnt RL. We find that soft-$\mathcal{Q}$ functions in MaxEnt RL can be obtained from the Perron-Frobenius eigenvalue and the corresponding left eigenvector of a regular, non-negative matrix derived from the underlying Markov Decision Process (MDP). The results derived lead to novel algorithms for model-based and model-free MaxEnt RL, which we validate by numerical simulations. The mapping established in this work opens further avenues for the application of novel analytical and computational approaches to problems in MaxEnt RL. We make our code available at: https://github.com/argearriojas/maxent-rl-mdp-scripts

eigenvector, maxent rl, trajectory distribution, (14 more...)

arXiv.org Machine Learning

2106.03931

Country:

North America > United States (0.08)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback