AITopics | non-parametric memory

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Neural Information Processing SystemsDec-24-2025, 03:42:55 GMT

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

knowledge-intensive nlp task, name change, retrieval-augmented generation, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Patrick Lewis

Neural Information Processing SystemsOct-3-2025, 04:07:37 GMT

However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures.

computational linguistic, machine learning, question answering, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Peru (0.14)
North America > Canada (0.04)
(12 more...)

Genre: Research Report (0.30)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.82)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Neural Information Processing SystemsJan-25-2025, 09:42:42 GMT

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation.

knowledge-intensive nlp task, non-parametric memory, retrieval-augmented generation, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Human-inspired Perspectives: A Survey on AI Long-term Memory

He, Zihong, Lin, Weizhe, Zheng, Hao, Zhang, Fan, Jones, Matt W., Aitchison, Laurence, Xu, Xuhai, Liu, Miao, Kristensson, Per Ola, Shen, Junxiao

arXiv.org Artificial IntelligenceJan-12-2025

With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize information over the long term - referred to as long-term memory - have become increasingly significant. These capabilities are crucial for enhancing the performance of AI systems across a wide range of tasks. However, there is currently no comprehensive survey that systematically investigates AI's long-term memory capabilities, formulates a theoretical framework, and inspires the development of next-generation AI long-term memory systems. This paper begins by introducing the mechanisms of human long-term memory, then explores AI long-term memory mechanisms, establishing a mapping between the two. Based on the mapping relationships identified, we extend the current cognitive architectures and propose the Cognitive Architecture of Self-Adaptive Long-term Memory (SALM). SALM provides a theoretical framework for the practice of AI long-term memory and holds potential for guiding the creation of next-generation long-term memory driven AI systems. Finally, we delve into the future directions and application prospects of AI long-term memory.

evolutionary algorithm, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.00489

Country:

Asia (0.67)
North America > United States (0.67)
Europe (0.67)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (1.00)
(6 more...)

Add feedback

Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization

Rashiti, Gentiana, Karunaratne, Geethan, Sachan, Mrinmaya, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceSep-12-2024

The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.00004

Country:

Asia (1.00)
North America > United States (0.28)
North America > Mexico (0.28)

Genre: Research Report (0.63)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Wang, Boshi, Yue, Xiang, Su, Yu, Sun, Huan

arXiv.org Artificial IntelligenceMay-26-2024

We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison. We delve into the model's internals throughout training, conducting analytical experiments that reveal: 1) the mechanism behind grokking, such as the formation of the generalizing circuit and its relation to the relative efficiency of generalizing and memorizing circuits, and 2) the connection between systematicity and the configuration of the generalizing circuit. Our findings guide data and training setup to better induce implicit reasoning and suggest potential improvements to the transformer architecture, such as encouraging cross-layer knowledge sharing. Furthermore, we demonstrate that for a challenging reasoning task with a large search space, GPT-4-Turbo and Gemini-1.5-Pro

generalization, language model, transformer, (12 more...)

arXiv.org Artificial Intelligence

2405.15071

Country:

Asia > Middle East > Jordan (0.05)
Asia > Singapore (0.04)
North America > United States > Ohio (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

Mallen, Alex, Asai, Akari, Zhong, Victor, Das, Rajarshi, Khashabi, Daniel, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceJul-2-2023

Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.10511

Country:

North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
North America > Canada (0.04)
Europe > Kosovo > District of Gjakova > Rahovec (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

Semiparametric Language Models Are Scalable Continual Learners

Peng, Guangyue, Ge, Tao, Chen, Si-Qing, Wei, Furu, Wang, Houfeng

arXiv.org Artificial IntelligenceMar-2-2023

Semiparametric language models (LMs) have shown promise in continuously learning from new text data by combining a parameterized neural LM with a growable non-parametric memory for memorizing new content. However, conventional semiparametric LMs will finally become prohibitive for computing and storing if they are applied to continual learning over streaming data, because the non-parametric memory grows linearly with the amount of data they learn from over time. To address the issue of scalability, we present a simple and intuitive approach called Selective Memorization (SeMem), which only memorizes difficult samples that the model is likely to struggle with. We demonstrate that SeMem improves the scalability of semiparametric LMs for continual learning over streaming data in two ways: (1) data-wise scalability: as the model becomes stronger through continual learning, it will encounter fewer difficult cases that need to be memorized, causing the growth of the non-parametric memory to slow down over time rather than growing at a linear rate with the size of training data; (2) model-wise scalability: SeMem allows a larger model to memorize fewer samples than its smaller counterpart because it is rarer for a larger model to encounter incomprehensible cases, resulting in a non-parametric memory that does not scale linearly with model size. We conduct extensive experiments in language modeling and downstream tasks to test SeMem's results, showing SeMem enables a semiparametric LM to be a scalable continual learner with little forgetting.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.01421

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.05)
North America > Dominican Republic (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Best Papers to Read on NLP

#artificialintelligenceMar-9-2022, 02:34:56 GMT

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. This NLP research paper explores a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation.

downstream task, language model, non-parametric memory, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.36)

Add feedback

Filters

Collaborating Authors

non-parametric memory

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Patrick Lewis

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Human-inspired Perspectives: A Survey on AI Long-term Memory

Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

Semiparametric Language Models Are Scalable Continual Learners

Best Papers to Read on NLP