Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

Zhu, Lingwei, Chen, Zheng, Schlegel, Matthew, White, Martha

Oct-24-2023–arXiv.org Artificial Intelligence

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence -- called the Tsallis KL divergence -- which use the $q$-logarithm in the definition. The approach is a strict generalization, as $q = 1$ corresponds to the standard KL divergence; $q > 1$ provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when $q >1$ could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches to incorporate KL regularization. We show that this generalized MVI($q$) obtains significant improvements over the standard MVI($q = 1$) across 35 Atari games.

generalized munchausen reinforcement learning, tsallis kl divergence

arXiv.org Artificial Intelligence

Oct-24-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.53)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found