11f9e78e4899a78dedd439fc583b6693-Paper.pdf

Feb-7-2026, 13:34:30 GMT–Neural Information Processing Systems

There, areward function isdrawn from one of multiple possible reward models atthebeginning ofeveryepisode, buttheidentity ofthechosen rewardmodel is not revealed to the agent. Hence, the latent state space, for which the dynamics are Markovian, is not given to the agent. We study the problem of learning a near optimal policy for two reward-mixing MDPs. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

Neural Information Processing Systems

Feb-7-2026, 13:34:30 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East > Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
11f9e78e4899a78dedd439fc583b6693-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found