Reviews: Learning Beam Search Policies via Imitation Learning
–Neural Information Processing Systems
This paper develops a unifying formalism (a meta-algorithm) for imitation learning in the context of beam search; that is, learning in a beam-search-aware manner with access to an oracle that will tell you the minimum completion cost for any partial state. This formalism exposes a subtle but important corner of the space that has not yet been explored: algorithms that are beam aware, but which do not reset or stop after the gold-standard falls out of the beam, but instead continue along the best possible remaining path. Using this formalism, they are able to develop regret guarantees for this new case, alongside novel guarantees for existing cases where the algorithm stops or resets. They show the case where the algorithm continues to have better regret guarantees. This paper is a masterwork of notation. No symbol is wasted, no symbol is under-explained.
Neural Information Processing Systems
Oct-7-2024, 20:14:56 GMT
- Technology: