Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Rae, Jack W., Borgeaud, Sebastian, Cai, Trevor, Millican, Katie, Hoffmann, Jordan, Song, Francis, Aslanides, John, Henderson, Sarah, Ring, Roman, Young, Susannah, Rutherford, Eliza, Hennigan, Tom, Menick, Jacob, Cassirer, Albin, Powell, Richard, Driessche, George van den, Hendricks, Lisa Anne, Rauh, Maribeth, Huang, Po-Sen, Glaese, Amelia, Welbl, Johannes, Dathathri, Sumanth, Huang, Saffron, Uesato, Jonathan, Mellor, John, Higgins, Irina, Creswell, Antonia, McAleese, Nat, Wu, Amy, Elsen, Erich, Jayakumar, Siddhant, Buchatskaya, Elena, Budden, David, Sutherland, Esme, Simonyan, Karen, Paganini, Michela, Sifre, Laurent, Martens, Lena, Li, Xiang Lorraine, Kuncoro, Adhiguna, Nematzadeh, Aida, Gribovskaya, Elena, Donato, Domenic, Lazaridou, Angeliki, Mensch, Arthur, Lespiau, Jean-Baptiste, Tsimpoukelli, Maria, Grigorev, Nikolai, Fritz, Doug, Sottiaux, Thibault, Pajarskas, Mantas, Pohlen, Toby, Gong, Zhitao, Toyama, Daniel, d'Autume, Cyprien de Masson, Li, Yujia, Terzi, Tayfun, Mikulik, Vladimir, Babuschkin, Igor, Clark, Aidan, Casas, Diego de Las, Guy, Aurelia, Jones, Chris, Bradbury, James, Johnson, Matthew, Hechtman, Blake, Weidinger, Laura, Gabriel, Iason, Isaac, William, Lockhart, Ed, Osindero, Simon, Rimell, Laura, Dyer, Chris, Vinyals, Oriol, Ayoub, Kareem, Stanway, Jeff, Bennett, Lorrayne, Hassabis, Demis, Kavukcuoglu, Koray, Irving, Geoffrey

Dec-8-2021–arXiv.org Artificial Intelligence

Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling -- predicting the future of a text sequence from its past -- provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original application: transferring text over a limited-bandwidth communication channel. Shannon's Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measuring its compression rate.

big-bench collaboration, gender and occupation bias, information processing system, (15 more...)

arXiv.org Artificial Intelligence

Dec-8-2021

arXiv.org PDF

Add feedback

Country:
- South America
  - Chile (0.04)
  - French Guiana (0.04)
- Oceania
  - Nauru (0.04)
  - Australia > Victoria
    - Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - New York (0.04)
    - California (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
- Europe
  - Austria (0.04)
  - Czechia > Prague (0.04)
  - Finland (0.04)
  - Monaco (0.04)
  - France (0.04)
  - Holy See > Vatican City (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - United Kingdom
    - Wales (0.04)
    - England (0.04)
- Asia
  - Pakistan (0.04)
  - China > Hong Kong (0.04)
  - Middle East
    - Jordan (0.04)
    - Iraq (0.04)
    - Syria > Damascus Governorate
      - Damascus (0.04)
    - Iran > Tehran Province
      - Tehran (0.04)
  - Japan > Honshū
    - Kansai > Osaka Prefecture
      - Osaka (0.04)
    - Chūbu > Toyama Prefecture
      - Toyama (0.04)
- Africa > Middle East
  - Libya (0.04)

Genre:
- Personal (1.00)
- Overview (1.00)
- Research Report
  - New Finding (1.00)
  - Promising Solution (0.67)

Industry:
- Leisure & Entertainment (1.00)
- Law (1.00)
- Media (1.00)
- Energy (0.92)
- Information Technology > Security & Privacy (0.67)
- Banking & Finance > Economy (0.67)
- Health & Medicine
  - Consumer Health (0.92)
  - Therapeutic Area
    - Psychiatry/Psychology (0.92)
    - Infections and Infectious Diseases (0.92)
- Government
  - Voting & Elections (0.67)
  - Regional Government > North America Government
    - United States Government (1.00)
- Education
  - Curriculum > Subject-Specific Education (1.00)
  - Educational Setting (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Commonsense Reasoning (0.92)
  - Natural Language
    - Text Processing (1.00)
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.67)