Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems 

Reinforcement learning (RL) has shown great promise for developing dialogue1 management (DM) agents that are non-myopic, conduct rich conversations, and2 maximize overall user satisfaction. Despite recent developments in RL and lan-3 guage models (LMs), using RL to power conversational chatbots remains challeng-4 ing, in part because RL requires online exploration to learn effectively, whereas5 collecting novel human-bot interactions can be expensive and unsafe. This issue is6 exacerbated by the combinatorial action spaces facing these algorithms, as most7 LM agents generate responses at the word level. We develop a variety of RL algo-8 rithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert9 Language Models (MoE-LMs)--models that capture diverse semantics, generate10 utterances reflecting different intents, and are amenable for multi-turn DM. By11 exploiting MoE-LM structure, our methods significantly reduce the size of the12 action space and improve the efficacy of RL-based DM.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found