PaLM: Scaling Language Modeling with Pathways
Chowdhery, Aakanksha, Narang, Sharan, Devlin, Jacob, Bosma, Maarten, Mishra, Gaurav, Roberts, Adam, Barham, Paul, Chung, Hyung Won, Sutton, Charles, Gehrmann, Sebastian, Schuh, Parker, Shi, Kensen, Tsvyashchenko, Sasha, Maynez, Joshua, Rao, Abhishek, Barnes, Parker, Tay, Yi, Shazeer, Noam, Prabhakaran, Vinodkumar, Reif, Emily, Du, Nan, Hutchinson, Ben, Pope, Reiner, Bradbury, James, Austin, Jacob, Isard, Michael, Gur-Ari, Guy, Yin, Pengcheng, Duke, Toju, Levskaya, Anselm, Ghemawat, Sanjay, Dev, Sunipa, Michalewski, Henryk, Garcia, Xavier, Misra, Vedant, Robinson, Kevin, Fedus, Liam, Zhou, Denny, Ippolito, Daphne, Luan, David, Lim, Hyeontaek, Zoph, Barret, Spiridonov, Alexander, Sepassi, Ryan, Dohan, David, Agrawal, Shivani, Omernick, Mark, Dai, Andrew M., Pillai, Thanumalayan Sankaranarayana, Pellat, Marie, Lewkowycz, Aitor, Moreira, Erica, Child, Rewon, Polozov, Oleksandr, Lee, Katherine, Zhou, Zongwei, Wang, Xuezhi, Saeta, Brennan, Diaz, Mark, Firat, Orhan, Catasta, Michele, Wei, Jason, Meier-Hellstern, Kathy, Eck, Douglas, Dean, Jeff, Petrov, Slav, Fiedel, Noah
–arXiv.org Artificial Intelligence
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
arXiv.org Artificial Intelligence
Oct-5-2022
- Country:
- Pacific Ocean (0.04)
- North America
- Dominican Republic (0.04)
- United States
- South Dakota (0.04)
- Oklahoma (0.04)
- Virginia (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Seattle (0.13)
- California
- San Francisco County > San Francisco (0.14)
- San Diego County > San Diego (0.04)
- Indiana > Bartholomew County
- Columbus (0.04)
- New York > New York County
- New York City (0.04)
- Canada > British Columbia
- Europe
- France (0.04)
- Sweden (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy
- Tuscany > Florence (0.04)
- Calabria > Catanzaro Province
- Catanzaro (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Asia
- Russia (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.14)
- Chūbu > Toyama Prefecture
- Toyama (0.04)
- Kantō > Tokyo Metropolis Prefecture
- India
- West Bengal (0.04)
- Maharashtra > Mumbai (0.04)
- Gujarat (0.04)
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Leisure & Entertainment (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- Education (1.00)
- Information Technology (0.92)
- Government (0.67)
- Law Enforcement & Public Safety > Terrorism (0.45)
- Energy > Renewable (0.45)
- Transportation > Air (0.45)
- Media > News (0.45)
- Technology:
- Information Technology > Artificial Intelligence
- Representation & Reasoning > Commonsense Reasoning (0.92)
- Cognitive Science > Problem Solving (0.92)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Machine Translation (0.93)
- Text Processing (0.92)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology > Artificial Intelligence