Goto

Collaborating Authors

 Kwon, Taehwan


Hexa: Self-Improving for Knowledge-Grounded Dialogue System

arXiv.org Artificial Intelligence

A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation. Along with the progress of Language Model (LM) pretraining, open-domain dialogue models have evolved to leverage the advantage of the transformer architecture's generalization ability (Zhang et al., 2019; Freitas et al., 2020; Roller et al., 2021; Xu et al., 2022a; Shuster et al., 2022b; Thoppilan et al., 2022). While model scaling also improves the dialogue quality (Freitas et al., 2020) as seen in large LMs, relying on sole LMs casts limitations such as hallucination and the lack of faithfulness by outdated training data (Brown et al., 2020; Thoppilan et al., 2022; Chowdhery et al., 2022). In order to overcome the limitations, prior works have adopted a modular design where multiple modules generate intermediate texts (e.g., to retrieve documents) before the final response (Lewis et al., 2020; Adolphs et al., 2021; Zhang et al., 2021; Shuster et al., 2022a). Among them, Komeili et al. (2022); Shuster et al. (2022b) have shown promising results in dialogue generation. Specifically, they adopted a modular design to integrate external knowledge (e.g., internet) and internal knowledge (e.g., memory) in dialogue models. For example, in Komeili et al. (2022), a LM first decides whether to access a knowledge in a form of text generation. Upon deciding to access knowledge, the LM generates an appropriate query for knowledge retrieval from external sources such as search engines. Then, the LM generates a response based on extracted knowledge from the accessed data. See Figure 2 of Appendix A for an illustrative example. Regarding each intermediate phase as a separate module, a convenient method of training these modules would be to apply supervised learning on each module using individual datasets (Dinan et al., 2019; Shuster et al., 2022a; Glass et al., 2022; Shuster et al., 2022b).


Effortless Integration of Memory Management into Open-Domain Conversation Systems

arXiv.org Artificial Intelligence

Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propose an automating dataset creation for memory management. Our method 1) requires little cost for data construction, 2) does not affect performance in other tasks, and 3) reduces external memory. We show that our proposed model BlenderBot3-M^3, which is multi-task trained with memory management, outperforms BlenderBot3 with a relative 4% performance gain in terms of F1 score.


Variational Intrinsic Control Revisited

arXiv.org Machine Learning

In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent. In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly. We show that the intrinsic reward used in the latter is subject to bias in stochastic environments, causing convergence to suboptimal solutions. To correct this behavior, we propose two methods respectively based on the transitional probability model and Gaussian Mixture Model. We substantiate our claims through rigorous mathematical derivations and experimental analyses.