Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Oct-9-2024, 18:32:08 GMT–Neural Information Processing Systems

Offline reinforcement learning (RL) enables learning a decision-making policy without interaction with the environment. This makes it particularly beneficial in situations where such interactions are costly. However, a known challenge for offline RL algorithms is the distributional mismatch between the state-action distributions of the learned policy and the dataset, which can significantly impact performance. State-of-the-art algorithms address it by constraining the policy to align with the state-action pairs in the dataset. However, this strategy struggles on datasets that predominantly consist of trajectories collected by low-performing policies and only a few trajectories from high-performing ones. Indeed, the constraint to align with the data leads the policy to imitate low-performing behaviors predominating the dataset.

imbalanced dataset, offline reinforcement learning, uniform sampling, (3 more...)

Neural Information Processing Systems

Oct-9-2024, 18:32:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)