Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Rae, Jack W., Borgeaud, Sebastian, Cai, Trevor, Millican, Katie, Hoffmann, Jordan, Song, Francis, Aslanides, John, Henderson, Sarah, Ring, Roman, Young, Susannah, Rutherford, Eliza, Hennigan, Tom, Menick, Jacob, Cassirer, Albin, Powell, Richard, Driessche, George van den, Hendricks, Lisa Anne, Rauh, Maribeth, Huang, Po-Sen, Glaese, Amelia, Welbl, Johannes, Dathathri, Sumanth, Huang, Saffron, Uesato, Jonathan, Mellor, John, Higgins, Irina, Creswell, Antonia, McAleese, Nat, Wu, Amy, Elsen, Erich, Jayakumar, Siddhant, Buchatskaya, Elena, Budden, David, Sutherland, Esme, Simonyan, Karen, Paganini, Michela, Sifre, Laurent, Martens, Lena, Li, Xiang Lorraine, Kuncoro, Adhiguna, Nematzadeh, Aida, Gribovskaya, Elena, Donato, Domenic, Lazaridou, Angeliki, Mensch, Arthur, Lespiau, Jean-Baptiste, Tsimpoukelli, Maria, Grigorev, Nikolai, Fritz, Doug, Sottiaux, Thibault, Pajarskas, Mantas, Pohlen, Toby, Gong, Zhitao, Toyama, Daniel, d'Autume, Cyprien de Masson, Li, Yujia, Terzi, Tayfun, Mikulik, Vladimir, Babuschkin, Igor, Clark, Aidan, Casas, Diego de Las, Guy, Aurelia, Jones, Chris, Bradbury, James, Johnson, Matthew, Hechtman, Blake, Weidinger, Laura, Gabriel, Iason, Isaac, William, Lockhart, Ed, Osindero, Simon, Rimell, Laura, Dyer, Chris, Vinyals, Oriol, Ayoub, Kareem, Stanway, Jeff, Bennett, Lorrayne, Hassabis, Demis, Kavukcuoglu, Koray, Irving, Geoffrey
–arXiv.org Artificial Intelligence
Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling -- predicting the future of a text sequence from its past -- provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original application: transferring text over a limited-bandwidth communication channel. Shannon's Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measuring its compression rate.
arXiv.org Artificial Intelligence
Dec-8-2021
- Country:
- South America
- Chile (0.04)
- French Guiana (0.04)
- Oceania
- North America
- Dominican Republic (0.04)
- United States
- New York (0.04)
- California (0.04)
- Texas > Travis County
- Austin (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- Europe
- Austria (0.04)
- Czechia > Prague (0.04)
- Finland (0.04)
- Monaco (0.04)
- France (0.04)
- Holy See > Vatican City (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- United Kingdom
- Asia
- Pakistan (0.04)
- China > Hong Kong (0.04)
- Middle East
- Jordan (0.04)
- Iraq (0.04)
- Syria > Damascus Governorate
- Damascus (0.04)
- Iran > Tehran Province
- Tehran (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture
- Osaka (0.04)
- Chūbu > Toyama Prefecture
- Toyama (0.04)
- Kansai > Osaka Prefecture
- Africa > Middle East
- Libya (0.04)
- South America
- Genre:
- Personal (1.00)
- Overview (1.00)
- Research Report
- New Finding (1.00)
- Promising Solution (0.67)
- Industry:
- Leisure & Entertainment (1.00)
- Law (1.00)
- Media (1.00)
- Energy (0.92)
- Information Technology > Security & Privacy (0.67)
- Banking & Finance > Economy (0.67)
- Health & Medicine
- Government
- Education
- Curriculum > Subject-Specific Education (1.00)
- Educational Setting (0.68)
- Technology: