Mutual Information Regularized Offline Reinforcement Learning

Oct-8-2025, 12:30:26 GMT–Neural Information Processing Systems

We show that optimizing this lower bound is equivalent to maximizing the likelihood of a one-step improved policy on the offline dataset. Hence, we constrain the policy improvement direction to lie in the data manifold.

estimation, misa, mutual information, (10 more...)

Neural Information Processing Systems

Oct-8-2025, 12:30:26 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East > Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning (0.94)

Duplicate Docs Excel Report

Title
3c6bd2021c10462c5164638d22f3d5d8-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found