Goto

Collaborating Authors

 Lin, Zehao


SurveyX: Academic Survey Automation via Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated exceptional comprehension capabilities and a vast knowledge base, suggesting that LLMs can serve as efficient tools for automated survey generation. However, recent research related to automated survey generation remains constrained by some critical limitations like finite context window, lack of in-depth content discussion, and absence of systematic evaluation frameworks. Inspired by human writing processes, we propose SurveyX, an efficient and organized system for automated survey generation that decomposes the survey composing process into two phases: the Preparation and Generation phases. By innovatively introducing online reference retrieval, a pre-processing method called AttributeTree, and a re-polishing process, SurveyX significantly enhances the efficacy of survey composition. Experimental evaluation results show that SurveyX outperforms existing automated survey generation systems in content quality (0.259 improvement) and citation quality (1.76 enhancement), approaching human expert performance across multiple evaluation dimensions. Examples of surveys generated by SurveyX are available on www.surveyx.cn


$\text{Memory}^3$: Language Modeling with Explicit Memory

arXiv.org Artificial Intelligence

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.


Teacher-Student Framework Enhanced Multi-domain Dialogue Generation

arXiv.org Artificial Intelligence

Dialogue systems dealing with multi-domain tasks are highly required. How to record the state remains a key problem in a task-oriented dialogue system. Normally we use human-defined features as dialogue states and apply a state tracker to extract these features. However, the performance of such a system is limited by the error propagation of a state tracker. In this paper, we propose a dialogue generation model that needs no external state trackers and still benefits from human-labeled semantic data. By using a teacher-student framework, several teacher models are firstly trained in their individual domains, learn dialogue policies from labeled states. And then the learned knowledge and experience are merged and transferred to a universal student model, which takes raw utterance as its input. Experiments show that the dialogue system trained under our framework outperforms the one uses a belief tracker.


MTSS: Learn from Multiple Domain Teachers and Become a Multi-domain Dialogue Expert

arXiv.org Artificial Intelligence

How to build a high-quality multi-domain dialogue system is a challenging work due to its complicated and entangled dialogue state space among each domain, which seriously limits the quality of dialogue policy, and further affects the generated response. In this paper, we propose a novel method to acquire a satisfying policy and subtly circumvent the knotty dialogue state representation problem in the multi-domain setting. Inspired by real school teaching scenarios, our method is composed of multiple domain-specific teachers and a universal student. Each individual teacher only focuses on one specific domain and learns its corresponding domain knowledge and dialogue policy based on a precisely extracted single domain dialogue state representation. Then, these domain-specific teachers impart their domain knowledge and policies to a universal student model and collectively make this student model a multi-domain dialogue expert. Experiment results show that our method reaches competitive results with SOTAs in both multi-domain and single domain setting.


Task-Oriented Conversation Generation Using Heterogeneous Memory Networks

arXiv.org Artificial Intelligence

How to incorporate external knowledge into a neural dialogue model is critically important for dialogue systems to behave like real humans. To handle this problem, memory networks are usually a great choice and a promising way. However, existing memory networks do not perform well when leveraging heterogeneous information from different sources. In this paper, we propose a novel and versatile external memory networks called Heterogeneous Memory Networks (HMNs), to simultaneously utilize user utterances, dialogue history and background knowledge tuples. In our method, historical sequential dialogues are encoded and stored into the context-aware memory enhanced by gating mechanism while grounding knowledge tuples are encoded and stored into the context-free memory. During decoding, the decoder augmented with HMNs recurrently selects each word in one response utterance from these two memories and a general vocabulary. Experimental results on multiple real-world datasets show that HMNs significantly outperform the state-of-the-art data-driven task-oriented dialogue models in most domains.