Learning in Observable POMDPs, without Computationally Intractable Oracles
–Neural Information Processing Systems
Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g.
Neural Information Processing Systems
Apr-24-2026, 11:30:06 GMT