General Munchausen Reinforcement Learning with Tsallis Kullback-Leibler Divergence

Jan-19-2025, 19:58:06 GMT–Neural Information Processing Systems

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence---called the Tsallis KL divergence. Tsallis KL defined by the q -logarithm is a strict generalization, as q 1 corresponds to the standard KL divergence; q 1 provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when q 1 could be beneficial.

general munchausen reinforcement learning, kl divergence, tsallis kullback-leibler divergence, (5 more...)

Neural Information Processing Systems

Jan-19-2025, 19:58:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)