Reinforcement Learning in Reward-Mixing MDPs

Jan-21-2025, 19:57:06 GMT–Neural Information Processing Systems

Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP). There, a reward function is drawn from one of M possible reward models at the beginning of every episode, but the identity of the chosen reward model is not revealed to the agent. Hence, the latent state space, for which the dynamics are Markovian, is not given to the agent. We study the problem of learning a near optimal policy for two reward-mixing MDPs.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Jan-21-2025, 19:57:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)