Goto

Collaborating Authors

 oreo



Appendix A Source codes

Neural Information Processing Systems

Specifically, we average the scores over 100 episodes evaluated on confounded environments for each random seed. We use Adam optimizer with the learning rate of 3e-4. Note that other regularization baselines are based on BC. In particular, OREO achieves the mean HNS of 114.9%, while Figure 9: We compare OREO to CCIL with environment interaction, on 6 confounded Atari environments. We investigate the possibility of applying OREO to other IL methods.



HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights

Gokdemir, Ozan, Siebenschuh, Carlo, Brace, Alexander, Wells, Azton, Hsu, Brian, Hippe, Kyle, Setty, Priyanka V., Ajith, Aswathy, Pauloski, J. Gregory, Sastry, Varuni, Foreman, Sam, Zheng, Huihuo, Ma, Heng, Kale, Bharat, Chia, Nicholas, Gibbs, Thomas, Papka, Michael E., Brettin, Thomas, Alexander, Francis J., Anandkumar, Anima, Foster, Ian, Stevens, Rick, Vishwanath, Venkatram, Ramanathan, Arvind

arXiv.org Artificial Intelligence

The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduces significant challenges, including the high computational costs associated with parsing documents and embedding scientific knowledge, as well as the algorithmic complexity of aligning these representations with the nuanced semantics of scientific content. To address these issues, we introduce HiPerRAG, a RAG workflow powered by high performance computing (HPC) to index and retrieve knowledge from more than 3.6 million scientific articles. At its core are Oreo, a high-throughput model for multimodal document parsing, and ColTrast, a query-aware encoder fine-tuning algorithm that enhances retrieval accuracy by using contrastive learning and late-interaction techniques. HiPerRAG delivers robust performance on existing scientific question answering benchmarks and two new benchmarks introduced in this work, achieving 90% accuracy on SciQ and 76% on PubMedQA-outperforming both domain-specific models like PubMedGPT and commercial LLMs such as GPT-4. Scaling to thousands of GPUs on the Polaris, Sunspot, and Frontier supercomputers, HiPerRAG delivers million document-scale RAG workflows for unifying scientific knowledge and fostering interdisciplinary innovation.


Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation

Li, Sha, Ramarkrishnan, Naren

arXiv.org Artificial Intelligence

Despite the remarkable capabilities of Large Language Models (LLMs) in various NLP tasks, they remain vulnerable to hallucinations due to their limited parametric knowledge and lack of domain-specific expertise. Retrieval-Augmented Generation (RAG) addresses this challenge by incorporating external document retrieval to augment the knowledge base of LLMs. In this approach, RAG retrieves document chunks from an external corpus in response to a query, which are then used as context for the downstream language model to generate an answer. However, these retrieved knowledge sources often include irrelevant or erroneous information, undermining the effectiveness of RAG in downstream tasks. To overcome this limitation, we introduce a compact, efficient, and pluggable module designed to refine external knowledge sources before feeding them to the generator. The module reconstructs retrieved content by extracting the most relevant and supportive information and reorganising it into a concise, query-specific format. Through a three-stage training paradigm - comprising supervised fine-tuning, contrastive multi-task learning, and reinforcement learning-based alignment - it prioritises critical knowledge and aligns it with the generator's preferences. This method enables LLMs to produce outputs that are more accurate, reliable, and contextually appropriate.


Offline Reinforcement Learning for LLM Multi-Step Reasoning

Wang, Huaijie, Hao, Shibo, Dong, Hanze, Zhang, Shenao, Bao, Yilin, Yang, Ziran, Wu, Yi

arXiv.org Artificial Intelligence

Improving the multi-step reasoning ability of large language models (LLMs) with offline reinforcement learning (RL) is essential for quickly adapting them to complex tasks. While Direct Preference Optimization (DPO) has shown promise in aligning LLMs with human preferences, it is less suitable for multi-step reasoning tasks because (1) DPO relies on paired preference data, which is not readily available for multi-step reasoning tasks, and (2) it treats all tokens uniformly, making it ineffective for credit assignment in multi-step reasoning tasks, which often come with sparse reward. In this work, we propose OREO (Offline Reasoning Optimization), an offline RL method for enhancing LLM multi-step reasoning. Building on insights from previous works of maximum entropy reinforcement learning, it jointly learns a policy model and value function by optimizing the soft Bellman Equation. We show in principle that it reduces the need to collect pairwise data and enables better credit assignment. Empirically, OREO surpasses existing offline learning methods on multi-step reasoning benchmarks, including mathematical reasoning tasks (GSM8K, MATH) and embodied agent control (ALFWorld). The approach can be extended to a multi-iteration framework when additional resources are available. Furthermore, the learned value function can be leveraged to guide the tree search for free, which can further boost performance during test time.


Nine AI Chatbots You Can Play With Right Now

The Atlantic - Technology

If you believe in the multibillion-dollar valuations, the prognostications from some of tech's most notable figures, and the simple magic of getting a computer to do your job for you, then you might say we're at the start of the chatbot era. Last November, OpenAI released ChatGPT into the unsuspecting world: It became the fastest-growing consumer app in history and immediately seemed to reconfigure how people think of conversational programs. Chatbots have existed for decades, but they haven't seemed especially intelligent--nothing like the poetry-writing, email-summarizing machines that have sprouted up recently. OpenAI has defined the moment, but there are plenty of competitors, including major players such as Google and Meta and lesser-known start-ups such as Anthropic. This cheat sheet tracks some of the most notable chatbot contenders through a few metrics: Can you actually use them? Do they contain glaring flaws?


I Have Questions for ChatGPT

The New Yorker

ChatGPT enables users to ask questions or tell a story, and the bot will respond with relevant, natural-sounding answers and topics. A friend gifted me a fancy designer bucket hat that she swore she didn't want anymore. Then we had a misunderstanding, and she ghosted my birthday party. And put a potato in her tailpipe. And slept with her ex.


Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

Park, Jongjin, Seo, Younggyo, Liu, Chang, Zhao, Li, Qin, Tao, Shin, Jinwoo, Liu, Tie-Yan

arXiv.org Artificial Intelligence

Behavioral cloning has proven to be effective for learning sequential decision-making policies from expert demonstrations. However, behavioral cloning often suffers from the causal confusion problem where a policy relies on the noticeable effect of expert actions due to the strong correlation but not the cause we desire. This paper presents Object-aware REgularizatiOn (OREO), a simple technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions. To this end, we introduce a two-stage approach: (a) we extract semantic objects from images by utilizing discrete codes from a vector-quantized variational autoencoder, and (b) we randomly drop the units that share the same discrete code together, i.e., masking out semantic objects. Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment. We also show that our method even outperforms inverse reinforcement learning methods trained with a considerable amount of environment interaction.


How Voice Will Capture Connected Commerce PYMNTS.com

#artificialintelligence

A massive sea change has happened in the world of retail – albeit so subtly and swiftly that most consumers probably never even felt it happen. Shopping went from being a discreet, defined, daily (or weekly or monthly) activity to something that has become like the background noise of modern interaction. As recently as the turn of the century, "going shopping" meant exactly that for over 90 percent of consumers: getting in a car and physically going someplace to make purchases. Flash forward to the closing months of the second decade of the 21st century, and shopping is not so much a thing that consumers go do, so much as something that is happening in the background of everything else customers are already doing. According to the 2019 edition of the PYMNTS How We Will Pay Study, the average consumer does about 12 activities over the course of a day, and makes a purchase during about four of them.