Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Rae, Jack W., Borgeaud, Sebastian, Cai, Trevor, Millican, Katie, Hoffmann, Jordan, Song, Francis, Aslanides, John, Henderson, Sarah, Ring, Roman, Young, Susannah, Rutherford, Eliza, Hennigan, Tom, Menick, Jacob, Cassirer, Albin, Powell, Richard, Driessche, George van den, Hendricks, Lisa Anne, Rauh, Maribeth, Huang, Po-Sen, Glaese, Amelia, Welbl, Johannes, Dathathri, Sumanth, Huang, Saffron, Uesato, Jonathan, Mellor, John, Higgins, Irina, Creswell, Antonia, McAleese, Nat, Wu, Amy, Elsen, Erich, Jayakumar, Siddhant, Buchatskaya, Elena, Budden, David, Sutherland, Esme, Simonyan, Karen, Paganini, Michela, Sifre, Laurent, Martens, Lena, Li, Xiang Lorraine, Kuncoro, Adhiguna, Nematzadeh, Aida, Gribovskaya, Elena, Donato, Domenic, Lazaridou, Angeliki, Mensch, Arthur, Lespiau, Jean-Baptiste, Tsimpoukelli, Maria, Grigorev, Nikolai, Fritz, Doug, Sottiaux, Thibault, Pajarskas, Mantas, Pohlen, Toby, Gong, Zhitao, Toyama, Daniel, d'Autume, Cyprien de Masson, Li, Yujia, Terzi, Tayfun, Mikulik, Vladimir, Babuschkin, Igor, Clark, Aidan, Casas, Diego de Las, Guy, Aurelia, Jones, Chris, Bradbury, James, Johnson, Matthew, Hechtman, Blake, Weidinger, Laura, Gabriel, Iason, Isaac, William, Lockhart, Ed, Osindero, Simon, Rimell, Laura, Dyer, Chris, Vinyals, Oriol, Ayoub, Kareem, Stanway, Jeff, Bennett, Lorrayne, Hassabis, Demis, Kavukcuoglu, Koray, Irving, Geoffrey
–arXiv.org Artificial Intelligence
Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling -- predicting the future of a text sequence from its past -- provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original application: transferring text over a limited-bandwidth communication channel. Shannon's Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measuring its compression rate.
arXiv.org Artificial Intelligence
Dec-8-2021
- Country:
- Africa > Middle East
- Libya (0.04)
- Asia
- China > Hong Kong (0.04)
- Japan > Honshū
- Chūbu > Toyama Prefecture
- Toyama (0.04)
- Kansai > Osaka Prefecture
- Osaka (0.04)
- Chūbu > Toyama Prefecture
- Middle East
- Iran > Tehran Province
- Tehran (0.04)
- Iraq (0.04)
- Jordan (0.04)
- Syria > Damascus Governorate
- Damascus (0.04)
- Iran > Tehran Province
- Pakistan (0.04)
- Europe
- Czechia > Prague (0.04)
- United Kingdom
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Holy See > Vatican City (0.04)
- France (0.04)
- Monaco (0.04)
- Finland (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Austria (0.04)
- North America
- Dominican Republic (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States
- California (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York (0.04)
- Texas > Travis County
- Austin (0.04)
- Oceania
- South America
- Chile (0.04)
- French Guiana (0.04)
- Africa > Middle East
- Genre:
- Overview (1.00)
- Personal (1.00)
- Research Report
- New Finding (1.00)
- Promising Solution (0.67)
- Industry:
- Media (1.00)
- Banking & Finance > Economy (0.67)
- Government
- Health & Medicine
- Law (1.00)
- Energy (0.92)
- Information Technology (1.00)
- Leisure & Entertainment (1.00)
- Education > Curriculum
- Subject-Specific Education (1.00)
- Technology: