AITopics | McAfee, Lawrence

Collaborating Authors

McAfee, Lawrence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

Wang, Boxin, Ping, Wei, McAfee, Lawrence, Xu, Peng, Li, Bo, Shoeybi, Mohammad, Catanzaro, Bryan

arXiv.org Artificial IntelligenceJan-31-2024

Pretraining auto-regressive large language models (LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval. Specifically, we continue to pretrain a 43B GPT model on additional 100 billion tokens using the Retro augmentation method by retrieving from 1.2 trillion tokens. Notably, the obtained foundation model, Retro 48B, largely outperforms the counterpart GPT 43B trained on 1.2T tokens in terms of perplexity with only 2.58% additional GPU hours, demonstrating the significant scaling potential of the method. After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA and reading comprehension tasks, 10% over GPT across 4 challenging long-form QA tasks, and 16% over GPT across 3 summarization tasks. Surprisingly, we find that one can ablate the encoder from InstructRetro architecture and directly use its decoder backbone, while achieving comparable results. Our results highlight the promising direction to obtain a better GPT decoder through continued pretraining with retrieval before instruction tuning. Our code and checkpoints are publicly available at: https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro/tools/retro.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.07713

Genre: Research Report > New Finding (0.48)

Industry:

Media (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Retrieval meets Long Context Large Language Models

Xu, Peng, Ping, Wei, Wu, Xianchao, McAfee, Lawrence, Zhu, Chen, Liu, Zihan, Subramanian, Sandeep, Bakhturina, Evelina, Shoeybi, Mohammad, Catanzaro, Bryan

arXiv.org Artificial IntelligenceJan-23-2024

Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? In this work, we answer these questions by studying both solutions using two state-of-the-art pretrained LLMs, i.e., a proprietary 43B GPT and Llama2-70B. Perhaps surprisingly, we find that LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window via positional interpolation on long context tasks, while taking much less computation. More importantly, we demonstrate that retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes. Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks including question answering, query-based summarization, and in-context few-shot learning tasks. It also outperforms its non-retrieval Llama2-70B-32k baseline by a margin, while being much faster at generation. Our study provides general insights on the choice of retrieval-augmentation versus long context extension of LLM for practitioners. The long context large language models (LLM) have recently received a lot of attention in production (e.g., Anthropic, 2023; OpenAI, 2023b), research community (e.g., Chen et al., 2023; Liu et al., 2023; Tworkowski et al., 2023), and open source community (e.g., Kaiokendev, 2023).

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.03025

Country:

Europe (0.68)
North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.46)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

Wang, Boxin, Ping, Wei, Xu, Peng, McAfee, Lawrence, Liu, Zihan, Shoeybi, Mohammad, Dong, Yi, Kuchaiev, Oleksii, Li, Bo, Xiao, Chaowei, Anandkumar, Anima, Catanzaro, Bryan

arXiv.org Artificial IntelligenceDec-20-2023

Large decoder-only language models (LMs) can be largely improved in terms of perplexity by retrieval (e.g., RETRO), but its impact on text generation quality and downstream task accuracy is unclear. Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages. We first provide the recipe to reproduce RETRO up to 9.5B parameters while retrieving a text corpus with 330B tokens. Based on that, we have the following novel findings: i) RETRO outperforms GPT on text generation with much less degeneration (i.e., repetition), moderately higher factual accuracy, and slightly lower toxicity with a nontoxic retrieval database. ii) On the LM Evaluation Harness benchmark, RETRO largely outperforms GPT on knowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore, we introduce a simple variant of the model, RETRO++, which largely improves open-domain QA results of original RETRO (e.g., EM score +8.6 on Natural Question) and significantly outperforms retrieval-augmented GPT in both fine-tuning and zero-shot evaluation settings. Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models. We release our code and model at: https://github.com/NVIDIA/Megatron-LM/blob/main/tools/retro/README.md

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2304.06762

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.87)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback