Goto

Collaborating Authors

 knowledge retriever


Review for NeurIPS paper: Zero-Resource Knowledge-Grounded Dialogue Generation

Neural Information Processing Systems

Weaknesses: - It is hard to judge whether the proposed method gains good results because of the proposed learning method or the help of the strong pretrained UniLM model. Even though they compare it with DialoGPT in the appendix, I also would like to see the model's performance without UniLM initialization or finetuned DialoGPT with the proposed dataset (e.g., Reddit conversation with top-1 retrieved knowledge). How do you select knowledge for ITDD? (ii) All the examples and details of human evaluation say that authors use ground-truth knowledge. Are all the models use GT knowledge in test time or use top-10 retrieved knowledge from Lucene knowledge retriever? If so, the performance of some baselines would be revised.


Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

Conia, Simone, Lee, Daniel, Li, Min, Minhas, Umar Farooq, Potdar, Saloni, Li, Yunyao

arXiv.org Artificial Intelligence

Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129% and 62% relative improvement compared to NLLB-200 and GPT-4, respectively.


AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs

Gao, Mingzhe, Zhao, Jieru, Lin, Zhe, Ding, Wenchao, Hou, Xiaofeng, Feng, Yu, Li, Chao, Guo, Minyi

arXiv.org Artificial Intelligence

--Recently, the use of large language models (LLMs) for software code generation, e.g., C/C++ and Python, has proven a great success. However, LLMs still suffer from low syntactic and functional correctness when it comes to the generation of register-transfer level (RTL) code, such as V erilog. T o address this issue, in this paper, we develop AutoVCoder, a systematic open-source framework that significantly improves the LLMs' correctness of generating V erilog code and enhances the quality of its output at the same time. Experimental results demonstrate that AutoVCoder outperforms both industrial and academic LLMs in V erilog code generation. Specifically, AutoVCoder shows a 0.5% and 2.2% improvement in functional correctness on the EvalMachine and EvalHuman benchmarks compared with BetterV, and also achieves a 3.4% increase in syntax correctness and a 3.4% increase in functional correctness on the RTLLM benchmark compared with RTLCoder . I. I NTRODUCTION Large Language Models (LLMs) has increasingly captured the attention of the academia and industry. In the realm of programming, LLMs have demonstrated remarkable success in generating software code, automating and streamlining the development process of programming languages like C, C++, and Python. Recently, some representative works [1, 2, 3, 4, 5, 6], including CodeT5 [1], CodeGen [2], CodeGeeX [3], have made tremendous breakthroughs in augmenting LLMs for software code generation. Additionally, commercial tools such as Copilot [7] and GPT -4 [8] have demonstrated notable performance in code generation. The progress is largely driven by advances in model architecture, training techniques, and most importantly, the vast amounts of data on which these models are trained.


Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

An, Wenbin, Tian, Feng, Nie, Jiahao, Shi, Wenkai, Lin, Haonan, Chen, Yan, Wang, QianYing, Wu, Yaqiang, Dai, Guang, Chen, Ping

arXiv.org Artificial Intelligence

Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acquiring different kinds of knowledge in a coupled manner may confuse models and hinder them from retrieving precise knowledge. Furthermore, the ``forward-only'' answering process fails to explicitly capture the knowledge needs of LLMs, which can further hurt answering quality. To cope with the above limitations, we propose DKA: Disentangled Knowledge Acquisition from LLM feedback, a training-free framework that disentangles knowledge acquisition to avoid confusion and uses LLM's feedback to specify the required knowledge. Specifically, DKA requires LLMs to specify what knowledge they need to answer the question and decompose the original complex question into two simple sub-questions: Image-based sub-question and Knowledge-based sub-question. Then we use the two sub-questions to retrieve knowledge from the image and knowledge base, respectively. In this way, two knowledge acquisition models can focus on the content that corresponds to them and avoid disturbance of irrelevant elements in the original complex question, which can help to provide more precise knowledge and better align the knowledge needs of LLMs to yield correct answers. Experiments on benchmark datasets show that DKA significantly outperforms SOTA models. To facilitate future research, our data and code are available at \url{https://github.com/Lackel/DKA}.


Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification

Zhao, Chenyu, Jiang, Yunjiang, Qiu, Yiming, Zhang, Han, Yang, Wen-Yun

arXiv.org Artificial Intelligence

Retrieval augmentation, which enhances downstream models by a knowledge retriever and an external corpus instead of by merely increasing the number of model parameters, has been successfully applied to many natural language processing (NLP) tasks such as text classification, question answering and so on. However, existing methods that separately or asynchronously train the retriever and downstream model mainly due to the non-differentiability between the two parts, usually lead to degraded performance compared to end-to-end joint training. In this paper, we propose Differentiable Retrieval Augmentation via Generative lANguage modeling(Dragan), to address this problem by a novel differentiable reformulation. We demonstrate the effectiveness of our proposed method on a challenging NLP task in e-commerce search, namely query intent classification. Both the experimental results and ablation study show that the proposed method significantly and reasonably improves the state-of-the-art baselines on both offline evaluation and online A/B test.


Leveraging Explicit Procedural Instructions for Data-Efficient Action Prediction

White, Julia, Raghuvanshi, Arushi, Pruksachatkun, Yada

arXiv.org Artificial Intelligence

Task-oriented dialogues often require agents to enact complex, multi-step procedures in order to meet user requests. While large language models have found success automating these dialogues in constrained environments, their widespread deployment is limited by the substantial quantities of task-specific data required for training. The following paper presents a data-efficient solution to constructing dialogue systems, leveraging explicit instructions derived from agent guidelines, such as company policies or customer service manuals. Our proposed Knowledge-Augmented Dialogue System (KADS) combines a large language model with a knowledge retrieval module that pulls documents outlining relevant procedures from a predefined set of policies, given a user-agent interaction. To train this system, we introduce a semi-supervised pre-training scheme that employs dialogue-document matching and action-oriented masked language modeling with partial parameter freezing. We evaluate the effectiveness of our approach on prominent task-oriented dialogue datasets, Action-Based Conversations Dataset and Schema-Guided Dialogue, for two dialogue tasks: action state tracking and workflow discovery. Our results demonstrate that procedural knowledge augmentation improves accuracy predicting in- and out-of-distribution actions while preserving high performance in settings with low or sparse data.


Reimagining Retrieval Augmented Language Models for Answering Queries

Tan, Wang-Chiew, Li, Yuliang, Rodriguez, Pedro, James, Richard, Lin, Xi Victoria, Halevy, Alon, Yih, Scott

arXiv.org Artificial Intelligence

We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-parametric architectures can be enhanced with views, a query analyzer/planner, and provenance to make a significantly more powerful system for question answering in terms of accuracy and efficiency, and potentially for other NLP tasks


Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision

Cai, Yucheng, Liu, Hong, Ou, Zhijian, Huang, Yi, Feng, Junlan

arXiv.org Artificial Intelligence

Most existing task-oriented dialog (TOD) systems track dialog states in terms of slots and values and use them to query a database to get relevant knowledge to generate responses. In real-life applications, user utterances are noisier, and thus it is more difficult to accurately track dialog states and correctly secure relevant knowledge. Recently, a progress in question answering and document-grounded dialog systems is retrieval-augmented methods with a knowledge retriever. Inspired by such progress, we propose a retrieval-based method to enhance knowledge selection in TOD systems, which significantly outperforms the traditional database query method for real-life dialogs. Further, we develop latent variable model based semi-supervised learning, which can work with the knowledge retriever to leverage both labeled and unlabeled dialog data. Joint Stochastic Approximation (JSA) algorithm is employed for semi-supervised model training, and the whole system is referred to as that JSA-KRTOD. Experiments are conducted on a real-life dataset from China Mobile Custom-Service, called MobileCS, and show that JSA-KRTOD achieves superior performances in both labeled-only and semi-supervised settings.


Q-TOD: A Query-driven Task-oriented Dialogue System

Tian, Xin, Lin, Yingzhan, Song, Mengfei, Bao, Siqi, Wang, Fan, He, Huang, Sun, Shuqi, Wu, Hua

arXiv.org Artificial Intelligence

Existing pipelined task-oriented dialogue systems usually have difficulties adapting to unseen domains, whereas end-to-end systems are plagued by large-scale knowledge bases in practice. In this paper, we introduce a novel query-driven task-oriented dialogue system, namely Q-TOD. The essential information from the dialogue context is extracted into a query, which is further employed to retrieve relevant knowledge records for response generation. Firstly, as the query is in the form of natural language and not confined to the schema of the knowledge base, the issue of domain adaption is alleviated remarkably in Q-TOD. Secondly, as the query enables the decoupling of knowledge retrieval from the generation, Q-TOD gets rid of the issue of knowledge base scalability. To evaluate the effectiveness of the proposed Q-TOD, we collect query annotations for three publicly available task-oriented dialogue datasets. Comprehensive experiments verify that Q-TOD outperforms strong baselines and establishes a new state-of-the-art performance on these datasets.


Context-Aware Attentive Knowledge Tracing

Ghosh, Aritra, Heffernan, Neil, Lan, Andrew S.

arXiv.org Artificial Intelligence

Knowledge tracing (KT) refers to the problem of predicting future learner performance given their past performance in educational applications. Recent developments in KT using flexible deep neural network-based models excel at this task. However, these models often offer limited interpretability, thus making them insufficient for personalized learning, which requires using interpretable feedback and actionable recommendations to help learners achieve better learning outcomes. In this paper, we propose attentive knowledge tracing (AKT), which couples flexible attention-based neural network models with a series of novel, interpretable model components inspired by cognitive and psychometric models. AKT uses a novel monotonic attention mechanism that relates a learner's future responses to assessment questions to their past responses; attention weights are computed using exponential decay and a context-aware relative distance measure, in addition to the similarity between questions. Moreover, we use the Rasch model to regularize the concept and question embeddings; these embeddings are able to capture individual differences among questions on the same concept without using an excessive number of parameters. We conduct experiments on several real-world benchmark datasets and show that AKT outperforms existing KT methods (by up to $6\%$ in AUC in some cases) on predicting future learner responses. We also conduct several case studies and show that AKT exhibits excellent interpretability and thus has potential for automated feedback and personalization in real-world educational settings.