Padthe, Karthik
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Yuan, Weizhe, Yu, Jane, Jiang, Song, Padthe, Karthik, Li, Yang, Wang, Dong, Kulikov, Ilia, Cho, Kyunghyun, Tian, Yuandong, Weston, Jason E, Li, Xian
Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding.
Improving Factuality with Explicit Working Memory
Chen, Mingda, Li, Yang, Padthe, Karthik, Shao, Rulin, Sun, Alicia, Zettlemoyer, Luke, Gosh, Gargi, Yih, Wen-tau
In the realm of long-form text generation, a notable vulnerability of large language models (LLMs) is their propensity for hallucination, wherein the generated text contains factually inaccurate information. By prepending the input prompt with relevant documents from trustworthy sources, retrieved-augmented generation (RAG) (Lewis et al., 2020; Shi et al., 2024) has been shown to be a simple yet effective approach that substantially mitigates the hallucination issue. To further enhance the factual accuracy of model output, various iterative prompting methods have been proposed that build upon RAG. For instance, FLARE (Jiang et al., 2023) generates responses sentence by sentence, and if a newly generated sentence contains low-probability tokens, it retrieves a new set of documents and re-runs RAG to regenerate the sentence. Alternatively, Self-RAG (Asai et al., 2024) employs a self-critic component to verify the correctness of each partial generation and repeatedly queries a retrieval system to update the background knowledge, thereby producing more accurate and faithful responses. While these systems demonstrate significant empirical improvement, they are restricted in the traditional RAG design. Context-relevant knowledge through retrieval is the only online feedback to the model, incorporated as part of the input string.
An Introduction to Vision-Language Modeling
Bordes, Florian, Pang, Richard Yuanzhe, Ajay, Anurag, Li, Alexander C., Bardes, Adrien, Petryk, Suzanne, Mañas, Oscar, Lin, Zhiqiu, Mahmoud, Anas, Jayaraman, Bargav, Ibrahim, Mark, Hall, Melissa, Xiong, Yunyang, Lebensold, Jonathan, Ross, Candace, Jayakumar, Srihari, Guo, Chuan, Bouchacourt, Diane, Al-Tahan, Haider, Padthe, Karthik, Sharma, Vasu, Xu, Hu, Tan, Xiaoqing Ellen, Richards, Megan, Lavoie, Samuel, Astolfi, Pietro, Hemmat, Reyhane Askari, Chen, Jun, Tirumala, Kushal, Assouel, Rim, Moayeri, Mazda, Talattof, Arjang, Chaudhuri, Kamalika, Liu, Zechun, Chen, Xilun, Garrido, Quentin, Ullrich, Karen, Agrawal, Aishwarya, Saenko, Kate, Celikyilmaz, Asli, Chandra, Vikas
Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges that need to be addressed to improve the reliability of those models. While language is discrete, vision evolves in a much higher dimensional space in which concepts cannot always be easily discretized. To better understand the mechanics behind mapping vision to language, we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them. Then, we present and discuss approaches to evaluate VLMs. Although this work primarily focuses on mapping images to language, we also discuss extending VLMs to videos.
Text Quality-Based Pruning for Efficient Training of Language Models
Sharma, Vasu, Padthe, Karthik, Ardalani, Newsha, Tirumala, Kushal, Howes, Russell, Xu, Hu, Huang, Po-Yao, Li, Shang-Wen, Aghajanyan, Armen, Ghosh, Gargi, Zettlemoyer, Luke
By leveraging attention in recent years due to their impressive this numerical text quality score, we demonstrate performance in various natural language processing how it can be used to prune the original dataset, (NLP) tasks (Zhang et al., 2022; Penedo et al., enabling the training of LMs using only a fraction 2023; Touvron et al., 2023; Zhou et al., 2023; Liu of the data. Our approach aims to identify et al., 2019). However, their training process often and eliminate low-quality text instances, thereby relies on computationally intensive procedures that streamlining the training process and mitigating the involve massive datasets and compute requirements burden of handling large-scale datasets. We also remove which hinders training large scale LMs on noisy potentially harmful content from the data by real-world or domain specific datasets. What's ensuring that harmful content is rated poorly by our worse is that several of these datasets are uncurated text quality score which can then be pruned. We and may contain harmful content which the observe an absolute improvement of 0.9% averaged LM model can potentially pick up during the training over 14 downstream evaluation tasks for multiple process (Deshpande et al., 2023; Schramowski LM models while using 40% lesser data and training et al., 2022; Kuchnik et al., 2023).