Goto

Collaborating Authors

 Sachan, Devendra Singh


Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

arXiv.org Artificial Intelligence

Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale.


Questions Are All You Need to Train a Dense Passage Retriever

arXiv.org Artificial Intelligence

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.


Improving Passage Retrieval with Zero-Shot Question Generation

arXiv.org Artificial Intelligence

Queries and documents of query scoring with count-based language are typically embedded in a shared representation models (Zhai and Lafferty, 2001). However, instead space to enable efficient search, before using of estimating a language model from each a task-specific model to perform a deeper, tokenlevel passage, UPR uses pre-trained language models document analysis (e.g. a document reader (PLMs). More recent work on re-rankers have finetuned that selects an answer span). We show that adding PLMs on question-passage pairs to generate a zero-shot re-ranker to the retrieval stage of such relevance labels (Nogueira et al., 2020), sometimes models leads to large gains in performance, by doing to jointly generate question and relevance deep token-level analysis with no task-specific labels (Nogueira dos Santos et al., 2020; Ju et al., data or tuning.


End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

arXiv.org Artificial Intelligence

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.


End-to-End Training of Neural Retrievers for Open-Domain Question Answering

arXiv.org Artificial Intelligence

Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse Cloze Task and masked salient spans, followed by supervised finetuning using question-context pairs. This approach leads to absolute gains of 2+ points over the previous best result in the top-20 retrieval accuracy on Natural Questions and TriviaQA datasets. We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models. In the first approach, the reader considers each retrieved document separately while in the second approach, the reader considers all the retrieved documents together. Our experiments demonstrate the effectiveness of these approaches as we obtain new state-of-the-art results. On the Natural Questions dataset, we obtain a top-20 retrieval accuracy of 84, an improvement of 5 points over the recent DPR model. In addition, we achieve good results on answer extraction, outperforming recent models like REALM and RAG by 3+ points. We further scale up end-to-end training to large models and show consistent gains in performance over smaller models.


Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation

arXiv.org Artificial Intelligence

We introduce Texar, an open-source toolkit aiming to support the broad set of text generation tasks that transforms any inputs into natural language, such as machine translation, summarization, dialog, content manipulation, and so forth. With the design goals of modularity, versatility, and extensibility in mind, Texar extracts common patterns underlying the diverse tasks and methodologies, creates a library of highly reusable modules and functionalities, and allows arbitrary model architectures and algorithmic paradigms. In Texar, model architecture, losses, and learning processes are fully decomposed. Modules at high concept level can be freely assembled or plugged in/swapped out. These features make Texar particularly suitable for researchers and practitioners to do fast prototyping and experimentation, as well as foster technique sharing across different text generation tasks. We provide case studies to demonstrate the use and advantage of the toolkit. Texar is released under Apache license 2.0 at https://github.com/asyml/texar.


Sports Video Classification from Multimodal Information Using Deep Neural Networks

AAAI Conferences

The work presents a methodology for classification of sports videos using both audio and visual information by applying deep learning algorithms. We show a methodology to combine multiple deep learning architectures through higher layers. Our method learns two separate models trained on audio and visual part of the data. We have trained the model for the audio part of the multimedia input using two stacked layers of CRBMs forminga CDBN. We also train two layered ISA network to extract features from video part of the data. We then train deep stacked autoencoder over both audio and visual features with discriminative fine tuning. Our results show that by combining both audio and visual features we get better accuracy as compared to single type of features.