Reviews: Learning Reward Machines for Partially Observable Reinforcement Learning
–Neural Information Processing Systems
The authors propose a novel approach for solving POMDPs by simultaneously learning and solving reward machines. The method relies on building a finite state machine which properly predicts possible observations and rewards. The authors demonstrate that their method outperforms baselines in three different partially observable gridworlds. Overall, I found the paper clear and well motivated. Learning to solve POMDPs is a very challenging problem and any progress or insight has the potential to have a big impact.
Neural Information Processing Systems
Jan-23-2025, 17:59:19 GMT