Reviews: Thinking Fast and Slow with Deep Learning and Tree Search
–Neural Information Processing Systems
SUMMARY: The paper proposes an algorithm that combines imitation learning with tree search, which results in an apprentice learning from an ever-improving expert. A DQN is trained to learn a policy derived from an MCTS agent, with the DQN providing generalisation to unseen states. It is also then used as feedback to improve the expert, which can then be used to retrain the DQN, and so on. The paper makes two contributions: (1) a new target for imitation learning, which is empirically shown to outperform a previously-suggested target and which results in the apprentice learning a policy of equal strength to the expert. COMMENTS: I found the paper generally well-written, clear and easy to follow, barring Section 6.
Neural Information Processing Systems
Oct-8-2024, 10:17:28 GMT
- Technology: