Goyal, Anirudh
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement
Xie, Yuxi, Goyal, Anirudh, Wu, Xiaobao, Yin, Xunjian, Xu, Xiao, Kan, Min-Yen, Pan, Liangming, Wang, William Yang
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges, we propose Context-Wise Order-Agnostic Language Modeling (COrAL), which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process. Leveraging the order-agnostic nature of COrAL, we introduce sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows. This allows the model to iteratively refine its outputs in parallel in the sliding block, effectively capturing diverse dependencies without the high inference cost of sequential generation. Empirical evaluations on reasoning tasks demonstrate that COrAL improves performance and inference speed, respectively, achieving absolute accuracy gains of $4.6\%$ on GSM8K and $4.0\%$ on LogiQA, along with inference speedups of up to $3.9\times$ over next-token baselines. Preliminary results on code generation indicate a drop in pass rates due to inconsistencies in order-agnostic outputs, highlighting the inherent quality--speed trade-off. Our code is publicly available at https://github.com/YuxiXie/COrAL.
Can Models Learn Skill Composition from Examples?
Zhao, Haoyu, Kaur, Simran, Yu, Dingli, Goyal, Anirudh, Arora, Sanjeev
As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified $k$-tuple of language skills. While small models struggled with composing even with $k=3$, larger models like GPT-4 performed reasonably well with $k=5$ and $6$. In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of $k$ skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of $k$, revealed the following findings: (1) Training on combinations of $k=2$ and $3$ skills results in noticeable improvements in the ability to compose texts with $k=4$ and $5$ skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates
Meo, Cristian, Sycheva, Ksenia, Goyal, Anirudh, Dauwels, Justin
It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods.
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Xie, Yuxi, Goyal, Anirudh, Zheng, Wenyue, Kan, Min-Yen, Lillicrap, Timothy P., Kawaguchi, Kenji, Shieh, Michael
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhance consistency in intermediate steps, we combine outcome validation and stepwise self-evaluation, continually updating the quality assessment of newly generated data. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data. Theoretical analysis reveals the importance of using on-policy sampled data for successful self-improving. Extensive evaluations on various arithmetic and commonsense reasoning tasks demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Mistral-7B Supervised Fine-Tuning (SFT) baseline on GSM8K, MATH, and ARC-C, with substantial increases in accuracy to $81.8\%$ (+$5.9\%$), $34.7\%$ (+$5.8\%$), and $76.4\%$ (+$15.8\%$), respectively. Additionally, our research delves into the training and inference compute tradeoff, providing insights into how our method effectively maximizes performance gains. Our code is publicly available at https://github.com/YuxiXie/MCTS-DPO.
Accelerating Greedy Coordinate Gradient via Probe Sampling
Zhao, Yiran, Zheng, Wenyue, Cai, Tianle, Do, Xuan Long, Kawaguchi, Kenji, Goyal, Anirudh, Shieh, Michael
Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called $\texttt{Probe sampling}$. At the core of the algorithm is a mechanism that dynamically determines how similar a smaller draft model's predictions are to the target model's predictions for prompt candidates. When the target model is similar to the draft model, we rely heavily on the draft model to filter out a large number of potential prompt candidates. Probe sampling achieves up to $5.6$ times speedup using Llama2-7b-chat and leads to equal or improved attack success rate (ASR) on the AdvBench. Furthermore, probe sampling is also able to accelerate other prompt optimization techniques and adversarial methods, leading to acceleration of $1.8\times$ for AutoPrompt, $2.4\times$ for APE and $2.4\times$ for AutoDAN.
Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs
Guo, Siyuan, Didolkar, Aniket, Ke, Nan Rosemary, Goyal, Anirudh, Huszár, Ferenc, Schölkopf, Bernhard
Motivated by the use of LLM as a scientific assistant, our paper assesses the domain knowledge of LLMs We are beginning to see progress in language through their understanding of different mathematical model assisted scientific discovery. Motivated skills required to solve problems. Understanding by the use of LLMs as a general scientific can be measured in two ways: the degree to which it assistant, this paper assesses the domain solves problems correctly; and the degree to which it knowledge of LLMs through its understanding enables fast adaptation to new knowledge. Similarly, of different mathematical skills required "understanding" in an LLM has two facets: on the one to solve problems. In particular, we look at hand, pre-trained LLMs possess knowledge that allows not just what the pre-trained model already remarkable performance in zero-shot tasks; on the knows, but how it learned to learn from other hand, pre-trained LLMs can learn new knowledge, information during in-context learning or either by leveraging in-context learning or by instruction-tuning through exploiting the instruction-tuning from base parameters as initialization.
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Didolkar, Aniket, Goyal, Anirudh, Ke, Nan Rosemary, Guo, Siyuan, Valko, Michal, Lillicrap, Timothy, Rezende, Danilo, Bengio, Yoshua, Mozer, Michael, Arora, Sanjeev
Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Lyu, Kaifeng, Zhao, Haoyu, Gu, Xinran, Yu, Dingli, Goyal, Anirudh, Arora, Sanjeev
Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extensive experiments on several chat models (Meta's Llama 2-Chat, Mistral AI's Mistral 7B Instruct v0.2, and OpenAI's GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the "Pure Tuning, Safe Testing" (PTST) principle -- fine-tune models without a safety prompt, but include it at test time. Fine-tuning experiments on GSM8K, ChatDoctor, and OpenOrca show that PTST significantly reduces the rise of unsafe behaviors, and even almost eliminates them in some cases.
Can AI Be as Creative as Humans?
Wang, Haonan, Zou, James, Mozer, Michael, Goyal, Anirudh, Lamb, Alex, Zhang, Linjun, Su, Weijie J, Deng, Zhun, Xie, Michael Qizhe, Brown, Hannah, Kawaguchi, Kenji
Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators. Therefore, the debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data. To arrive at this conclusion, this paper first addresses the complexities in defining creativity by introducing a new concept called Relative Creativity. Rather than attempting to define creativity universally, we shift the focus to whether AI can match the creative abilities of a hypothetical human. The methodological shift leads to a statistically quantifiable assessment of AI's creativity, term Statistical Creativity. This concept, statistically comparing the creative abilities of AI with those of specific human groups, facilitates theoretical exploration of AI's creative potential. Our analysis reveals that by fitting extensive conditional data without marginalizing out the generative conditions, AI can emerge as a hypothetical new creator. The creator possesses the same creative abilities on par with the human creators it was trained on. Building on theoretical findings, we discuss the application in prompt-conditioned autoregressive models, providing a practical means for evaluating creative abilities of generative AI models, such as Large Language Models (LLMs). Additionally, this study provides an actionable training guideline, bridging the theoretical quantification of creativity with practical model training.
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, null, Anil, Rohan, Borgeaud, Sebastian, Wu, Yonghui, Alayrac, Jean-Baptiste, Yu, Jiahui, Soricut, Radu, Schalkwyk, Johan, Dai, Andrew M., Hauth, Anja, Millican, Katie, Silver, David, Petrov, Slav, Johnson, Melvin, Antonoglou, Ioannis, Schrittwieser, Julian, Glaese, Amelia, Chen, Jilin, Pitler, Emily, Lillicrap, Timothy, Lazaridou, Angeliki, Firat, Orhan, Molloy, James, Isard, Michael, Barham, Paul R., Hennigan, Tom, Lee, Benjamin, Viola, Fabio, Reynolds, Malcolm, Xu, Yuanzhong, Doherty, Ryan, Collins, Eli, Meyer, Clemens, Rutherford, Eliza, Moreira, Erica, Ayoub, Kareem, Goel, Megha, Tucker, George, Piqueras, Enrique, Krikun, Maxim, Barr, Iain, Savinov, Nikolay, Danihelka, Ivo, Roelofs, Becca, White, Anaïs, Andreassen, Anders, von Glehn, Tamara, Yagati, Lakshman, Kazemi, Mehran, Gonzalez, Lucas, Khalman, Misha, Sygnowski, Jakub, Frechette, Alexandre, Smith, Charlotte, Culp, Laura, Proleev, Lev, Luan, Yi, Chen, Xi, Lottes, James, Schucher, Nathan, Lebron, Federico, Rrustemi, Alban, Clay, Natalie, Crone, Phil, Kocisky, Tomas, Zhao, Jeffrey, Perz, Bartek, Yu, Dian, Howard, Heidi, Bloniarz, Adam, Rae, Jack W., Lu, Han, Sifre, Laurent, Maggioni, Marcello, Alcober, Fred, Garrette, Dan, Barnes, Megan, Thakoor, Shantanu, Austin, Jacob, Barth-Maron, Gabriel, Wong, William, Joshi, Rishabh, Chaabouni, Rahma, Fatiha, Deeni, Ahuja, Arun, Liu, Ruibo, Li, Yunxuan, Cogan, Sarah, Chen, Jeremy, Jia, Chao, Gu, Chenjie, Zhang, Qiao, Grimstad, Jordan, Hartman, Ale Jakse, Chadwick, Martin, Tomar, Gaurav Singh, Garcia, Xavier, Senter, Evan, Taropa, Emanuel, Pillai, Thanumalayan Sankaranarayana, Devlin, Jacob, Laskin, Michael, Casas, Diego de Las, Valter, Dasha, Tao, Connie, Blanco, Lorenzo, Badia, Adrià Puigdomènech, Reitter, David, Chen, Mianna, Brennan, Jenny, Rivera, Clara, Brin, Sergey, Iqbal, Shariq, Surita, Gabriela, Labanowski, Jane, Rao, Abhi, Winkler, Stephanie, Parisotto, Emilio, Gu, Yiming, Olszewska, Kate, Zhang, Yujing, Addanki, Ravi, Miech, Antoine, Louis, Annie, Shafey, Laurent El, Teplyashin, Denis, Brown, Geoff, Catt, Elliot, Attaluri, Nithya, Balaguer, Jan, Xiang, Jackie, Wang, Pidong, Ashwood, Zoe, Briukhov, Anton, Webson, Albert, Ganapathy, Sanjay, Sanghavi, Smit, Kannan, Ajay, Chang, Ming-Wei, Stjerngren, Axel, Djolonga, Josip, Sun, Yuting, Bapna, Ankur, Aitchison, Matthew, Pejman, Pedram, Michalewski, Henryk, Yu, Tianhe, Wang, Cindy, Love, Juliette, Ahn, Junwhan, Bloxwich, Dawn, Han, Kehang, Humphreys, Peter, Sellam, Thibault, Bradbury, James, Godbole, Varun, Samangooei, Sina, Damoc, Bogdan, Kaskasoli, Alex, Arnold, Sébastien M. R., Vasudevan, Vijay, Agrawal, Shubham, Riesa, Jason, Lepikhin, Dmitry, Tanburn, Richard, Srinivasan, Srivatsan, Lim, Hyeontaek, Hodkinson, Sarah, Shyam, Pranav, Ferret, Johan, Hand, Steven, Garg, Ankush, Paine, Tom Le, Li, Jian, Li, Yujia, Giang, Minh, Neitz, Alexander, Abbas, Zaheer, York, Sarah, Reid, Machel, Cole, Elizabeth, Chowdhery, Aakanksha, Das, Dipanjan, Rogozińska, Dominika, Nikolaev, Vitaly, Sprechmann, Pablo, Nado, Zachary, Zilka, Lukas, Prost, Flavien, He, Luheng, Monteiro, Marianne, Mishra, Gaurav, Welty, Chris, Newlan, Josh, Jia, Dawei, Allamanis, Miltiadis, Hu, Clara Huiyi, de Liedekerke, Raoul, Gilmer, Justin, Saroufim, Carl, Rijhwani, Shruti, Hou, Shaobo, Shrivastava, Disha, Baddepudi, Anirudh, Goldin, Alex, Ozturel, Adnan, Cassirer, Albin, Xu, Yunhan, Sohn, Daniel, Sachan, Devendra, Amplayo, Reinald Kim, Swanson, Craig, Petrova, Dessie, Narayan, Shashi, Guez, Arthur, Brahma, Siddhartha, Landon, Jessica, Patel, Miteyan, Zhao, Ruizhe, Villela, Kevin, Wang, Luyu, Jia, Wenhao, Rahtz, Matthew, Giménez, Mai, Yeung, Legg, Lin, Hanzhao, Keeling, James, Georgiev, Petko, Mincu, Diana, Wu, Boxi, Haykal, Salem, Saputro, Rachel, Vodrahalli, Kiran, Qin, James, Cankara, Zeynep, Sharma, Abhanshu, Fernando, Nick, Hawkins, Will, Neyshabur, Behnam, Kim, Solomon, Hutter, Adrian, Agrawal, Priyanka, Castro-Ros, Alex, Driessche, George van den, Wang, Tao, Yang, Fan, Chang, Shuo-yiin, Komarek, Paul, McIlroy, Ross, Lučić, Mario, Zhang, Guodong, Farhan, Wael, Sharman, Michael, Natsev, Paul, Michel, Paul, Cheng, Yong, Bansal, Yamini, Qiao, Siyuan, Cao, Kris, Shakeri, Siamak, Butterfield, Christina, Chung, Justin, Rubenstein, Paul Kishan, Agrawal, Shivani, Mensch, Arthur, Soparkar, Kedar, Lenc, Karel, Chung, Timothy, Pope, Aedan, Maggiore, Loren, Kay, Jackie, Jhakra, Priya, Wang, Shibo, Maynez, Joshua, Phuong, Mary, Tobin, Taylor, Tacchetti, Andrea, Trebacz, Maja, Robinson, Kevin, Katariya, Yash, Riedel, Sebastian, Bailey, Paige, Xiao, Kefan, Ghelani, Nimesh, Aroyo, Lora, Slone, Ambrose, Houlsby, Neil, Xiong, Xuehan, Yang, Zhen, Gribovskaya, Elena, Adler, Jonas, Wirth, Mateo, Lee, Lisa, Li, Music, Kagohara, Thais, Pavagadhi, Jay, Bridgers, Sophie, Bortsova, Anna, Ghemawat, Sanjay, Ahmed, Zafarali, Liu, Tianqi, Powell, Richard, Bolina, Vijay, Iinuma, Mariko, Zablotskaia, Polina, Besley, James, Chung, Da-Woon, Dozat, Timothy, Comanescu, Ramona, Si, Xiance, Greer, Jeremy, Su, Guolong, Polacek, Martin, Kaufman, Raphaël Lopez, Tokumine, Simon, Hu, Hexiang, Buchatskaya, Elena, Miao, Yingjie, Elhawaty, Mohamed, Siddhant, Aditya, Tomasev, Nenad, Xing, Jinwei, Greer, Christina, Miller, Helen, Ashraf, Shereen, Roy, Aurko, Zhang, Zizhao, Ma, Ada, Filos, Angelos, Besta, Milos, Blevins, Rory, Klimenko, Ted, Yeh, Chih-Kuan, Changpinyo, Soravit, Mu, Jiaqi, Chang, Oscar, Pajarskas, Mantas, Muir, Carrie, Cohen, Vered, Lan, Charline Le, Haridasan, Krishna, Marathe, Amit, Hansen, Steven, Douglas, Sholto, Samuel, Rajkumar, Wang, Mingqiu, Austin, Sophia, Lan, Chang, Jiang, Jiepu, Chiu, Justin, Lorenzo, Jaime Alonso, Sjösund, Lars Lowe, Cevey, Sébastien, Gleicher, Zach, Avrahami, Thi, Boral, Anudhyan, Srinivasan, Hansa, Selo, Vittorio, May, Rhys, Aisopos, Konstantinos, Hussenot, Léonard, Soares, Livio Baldini, Baumli, Kate, Chang, Michael B., Recasens, Adrià, Caine, Ben, Pritzel, Alexander, Pavetic, Filip, Pardo, Fabio, Gergely, Anita, Frye, Justin, Ramasesh, Vinay, Horgan, Dan, Badola, Kartikeya, Kassner, Nora, Roy, Subhrajit, Dyer, Ethan, Campos, Víctor, Tomala, Alex, Tang, Yunhao, Badawy, Dalia El, White, Elspeth, Mustafa, Basil, Lang, Oran, Jindal, Abhishek, Vikram, Sharad, Gong, Zhitao, Caelles, Sergi, Hemsley, Ross, Thornton, Gregory, Feng, Fangxiaoyu, Stokowiec, Wojciech, Zheng, Ce, Thacker, Phoebe, Ünlü, Çağlar, Zhang, Zhishuai, Saleh, Mohammad, Svensson, James, Bileschi, Max, Patil, Piyush, Anand, Ankesh, Ring, Roman, Tsihlas, Katerina, Vezer, Arpi, Selvi, Marco, Shevlane, Toby, Rodriguez, Mikel, Kwiatkowski, Tom, Daruki, Samira, Rong, Keran, Dafoe, Allan, FitzGerald, Nicholas, Gu-Lemberg, Keren, Khan, Mina, Hendricks, Lisa Anne, Pellat, Marie, Feinberg, Vladimir, Cobon-Kerr, James, Sainath, Tara, Rauh, Maribeth, Hashemi, Sayed Hadi, Ives, Richard, Hasson, Yana, Li, YaGuang, Noland, Eric, Cao, Yuan, Byrd, Nathan, Hou, Le, Wang, Qingze, Sottiaux, Thibault, Paganini, Michela, Lespiau, Jean-Baptiste, Moufarek, Alexandre, Hassan, Samer, Shivakumar, Kaushik, van Amersfoort, Joost, Mandhane, Amol, Joshi, Pratik, Goyal, Anirudh, Tung, Matthew, Brock, Andrew, Sheahan, Hannah, Misra, Vedant, Li, Cheng, Rakićević, Nemanja, Dehghani, Mostafa, Liu, Fangyu, Mittal, Sid, Oh, Junhyuk, Noury, Seb, Sezener, Eren, Huot, Fantine, Lamm, Matthew, De Cao, Nicola, Chen, Charlie, Elsayed, Gamaleldin, Chi, Ed, Mahdieh, Mahdis, Tenney, Ian, Hua, Nan, Petrychenko, Ivan, Kane, Patrick, Scandinaro, Dylan, Jain, Rishub, Uesato, Jonathan, Datta, Romina, Sadovsky, Adam, Bunyan, Oskar, Rabiej, Dominik, Wu, Shimu, Zhang, John, Vasudevan, Gautam, Leurent, Edouard, Alnahlawi, Mahmoud, Georgescu, Ionut, Wei, Nan, Zheng, Ivy, Chan, Betty, Rabinovitch, Pam G, Stanczyk, Piotr, Zhang, Ye, Steiner, David, Naskar, Subhajit, Azzam, Michael, Johnson, Matthew, Paszke, Adam, Chiu, Chung-Cheng, Elias, Jaume Sanchez, Mohiuddin, Afroz, Muhammad, Faizan, Miao, Jin, Lee, Andrew, Vieillard, Nino, Potluri, Sahitya, Park, Jane, Davoodi, Elnaz, Zhang, Jiageng, Stanway, Jeff, Garmon, Drew, Karmarkar, Abhijit, Dong, Zhe, Lee, Jong, Kumar, Aviral, Zhou, Luowei, Evens, Jonathan, Isaac, William, Chen, Zhe, Jia, Johnson, Levskaya, Anselm, Zhu, Zhenkai, Gorgolewski, Chris, Grabowski, Peter, Mao, Yu, Magni, Alberto, Yao, Kaisheng, Snaider, Javier, Casagrande, Norman, Suganthan, Paul, Palmer, Evan, Irving, Geoffrey, Loper, Edward, Faruqui, Manaal, Arkatkar, Isha, Chen, Nanxin, Shafran, Izhak, Fink, Michael, Castaño, Alfonso, Giannoumis, Irene, Kim, Wooyeol, Rybiński, Mikołaj, Sreevatsa, Ashwin, Prendki, Jennifer, Soergel, David, Goedeckemeyer, Adrian, Gierke, Willi, Jafari, Mohsen, Gaba, Meenu, Wiesner, Jeremy, Wright, Diana Gage, Wei, Yawen, Vashisht, Harsha, Kulizhskaya, Yana, Hoover, Jay, Le, Maigo, Li, Lu, Iwuanyanwu, Chimezie, Liu, Lu, Ramirez, Kevin, Khorlin, Andrey, Cui, Albert, LIN, Tian, Georgiev, Marin, Wu, Marcus, Aguilar, Ricardo, Pallo, Keith, Chakladar, Abhishek, Repina, Alena, Wu, Xihui, van der Weide, Tom, Ponnapalli, Priya, Kaplan, Caroline, Simsa, Jiri, Li, Shuangfeng, Dousse, Olivier, Yang, Fan, Piper, Jeff, Ie, Nathan, Lui, Minnie, Pasumarthi, Rama, Lintz, Nathan, Vijayakumar, Anitha, Thiet, Lam Nguyen, Andor, Daniel, Valenzuela, Pedro, Paduraru, Cosmin, Peng, Daiyi, Lee, Katherine, Zhang, Shuyuan, Greene, Somer, Nguyen, Duc Dung, Kurylowicz, Paula, Velury, Sarmishta, Krause, Sebastian, Hardin, Cassidy, Dixon, Lucas, Janzer, Lili, Choo, Kiam, Feng, Ziqiang, Zhang, Biao, Singhal, Achintya, Latkar, Tejasi, Zhang, Mingyang, Le, Quoc, Abellan, Elena Allica, Du, Dayou, McKinnon, Dan, Antropova, Natasha, Bolukbasi, Tolga, Keller, Orgad, Reid, David, Finchelstein, Daniel, Raad, Maria Abi, Crocker, Remi, Hawkins, Peter, Dadashi, Robert, Gaffney, Colin, Lall, Sid, Franko, Ken, Filonov, Egor, Bulanova, Anna, Leblond, Rémi, Yadav, Vikas, Chung, Shirley, Askham, Harry, Cobo, Luis C., Xu, Kelvin, Fischer, Felix, Xu, Jun, Sorokin, Christina, Alberti, Chris, Lin, Chu-Cheng, Evans, Colin, Zhou, Hao, Dimitriev, Alek, Forbes, Hannah, Banarse, Dylan, Tung, Zora, Liu, Jeremiah, Omernick, Mark, Bishop, Colton, Kumar, Chintu, Sterneck, Rachel, Foley, Ryan, Jain, Rohan, Mishra, Swaroop, Xia, Jiawei, Bos, Taylor, Cideron, Geoffrey, Amid, Ehsan, Piccinno, Francesco, Wang, Xingyu, Banzal, Praseem, Gurita, Petru, Noga, Hila, Shah, Premal, Mankowitz, Daniel J., Polozov, Alex, Kushman, Nate, Krakovna, Victoria, Brown, Sasha, Bateni, MohammadHossein, Duan, Dennis, Firoiu, Vlad, Thotakuri, Meghana, Natan, Tom, Mohananey, Anhad, Geist, Matthieu, Mudgal, Sidharth, Girgin, Sertan, Li, Hui, Ye, Jiayu, Roval, Ofir, Tojo, Reiko, Kwong, Michael, Lee-Thorp, James, Yew, Christopher, Yuan, Quan, Bagri, Sumit, Sinopalnikov, Danila, Ramos, Sabela, Mellor, John, Sharma, Abhishek, Severyn, Aliaksei, Lai, Jonathan, Wu, Kathy, Cheng, Heng-Tze, Miller, David, Sonnerat, Nicolas, Vnukov, Denis, Greig, Rory, Beattie, Jennifer, Caveness, Emily, Bai, Libin, Eisenschlos, Julian, Korchemniy, Alex, Tsai, Tomy, Jasarevic, Mimi, Kong, Weize, Dao, Phuong, Zheng, Zeyu, Liu, Frederick, Yang, Fan, Zhu, Rui, Geller, Mark, Teh, Tian Huey, Sanmiya, Jason, Gladchenko, Evgeny, Trdin, Nejc, Sozanschi, Andrei, Toyama, Daniel, Rosen, Evan, Tavakkol, Sasan, Xue, Linting, Elkind, Chen, Woodman, Oliver, Carpenter, John, Papamakarios, George, Kemp, Rupert, Kafle, Sushant, Grunina, Tanya, Sinha, Rishika, Talbert, Alice, Goyal, Abhimanyu, Wu, Diane, Owusu-Afriyie, Denese, Du, Cosmo, Thornton, Chloe, Pont-Tuset, Jordi, Narayana, Pradyumna, Li, Jing, Fatehi, Sabaer, Wieting, John, Ajmeri, Omar, Uria, Benigno, Zhu, Tao, Ko, Yeongil, Knight, Laura, Héliou, Amélie, Niu, Ning, Gu, Shane, Pang, Chenxi, Tran, Dustin, Li, Yeqing, Levine, Nir, Stolovich, Ariel, Kalb, Norbert, Santamaria-Fernandez, Rebeca, Goenka, Sonam, Yustalim, Wenny, Strudel, Robin, Elqursh, Ali, Lakshminarayanan, Balaji, Deck, Charlie, Upadhyay, Shyam, Lee, Hyo, Dusenberry, Mike, Li, Zonglin, Wang, Xuezhi, Levin, Kyle, Hoffmann, Raphael, Holtmann-Rice, Dan, Bachem, Olivier, Yue, Summer, Arora, Sho, Malmi, Eric, Mirylenka, Daniil, Tan, Qijun, Koh, Christy, Yeganeh, Soheil Hassas, Põder, Siim, Zheng, Steven, Pongetti, Francesco, Tariq, Mukarram, Sun, Yanhua, Ionita, Lucian, Seyedhosseini, Mojtaba, Tafti, Pouya, Kotikalapudi, Ragha, Liu, Zhiyu, Gulati, Anmol, Liu, Jasmine, Ye, Xinyu, Chrzaszcz, Bart, Wang, Lily, Sethi, Nikhil, Li, Tianrun, Brown, Ben, Singh, Shreya, Fan, Wei, Parisi, Aaron, Stanton, Joe, Kuang, Chenkai, Koverkathu, Vinod, Choquette-Choo, Christopher A., Li, Yunjie, Lu, TJ, Ittycheriah, Abe, Shroff, Prakash, Sun, Pei, Varadarajan, Mani, Bahargam, Sanaz, Willoughby, Rob, Gaddy, David, Dasgupta, Ishita, Desjardins, Guillaume, Cornero, Marco, Robenek, Brona, Mittal, Bhavishya, Albrecht, Ben, Shenoy, Ashish, Moiseev, Fedor, Jacobsson, Henrik, Ghaffarkhah, Alireza, Rivière, Morgane, Walton, Alanna, Crepy, Clément, Parrish, Alicia, Liu, Yuan, Zhou, Zongwei, Farabet, Clement, Radebaugh, Carey, Srinivasan, Praveen, van der Salm, Claudia, Fidjeland, Andreas, Scellato, Salvatore, Latorre-Chimoto, Eri, Klimczak-Plucińska, Hanna, Bridson, David, de Cesare, Dario, Hudson, Tom, Mendolicchio, Piermaria, Walker, Lexi, Morris, Alex, Penchev, Ivo, Mauger, Matthew, Guseynov, Alexey, Reid, Alison, Odoom, Seth, Loher, Lucia, Cotruta, Victor, Yenugula, Madhavi, Grewe, Dominik, Petrushkina, Anastasia, Duerig, Tom, Sanchez, Antonio, Yadlowsky, Steve, Shen, Amy, Globerson, Amir, Kurzrok, Adam, Webb, Lynette, Dua, Sahil, Li, Dong, Lahoti, Preethi, Bhupatiraju, Surya, Hurt, Dan, Qureshi, Haroon, Agarwal, Ananth, Shani, Tomer, Eyal, Matan, Khare, Anuj, Belle, Shreyas Rammohan, Wang, Lei, Tekur, Chetan, Kale, Mihir Sanjay, Wei, Jinliang, Sang, Ruoxin, Saeta, Brennan, Liechty, Tyler, Sun, Yi, Zhao, Yao, Lee, Stephan, Nayak, Pandu, Fritz, Doug, Vuyyuru, Manish Reddy, Aslanides, John, Vyas, Nidhi, Wicke, Martin, Ma, Xiao, Bilal, Taylan, Eltyshev, Evgenii, Balle, Daniel, Martin, Nina, Cate, Hardie, Manyika, James, Amiri, Keyvan, Kim, Yelin, Xiong, Xi, Kang, Kai, Luisier, Florian, Tripuraneni, Nilesh, Madras, David, Guo, Mandy, Waters, Austin, Wang, Oliver, Ainslie, Joshua, Baldridge, Jason, Zhang, Han, Pruthi, Garima, Bauer, Jakob, Yang, Feng, Mansour, Riham, Gelman, Jason, Xu, Yang, Polovets, George, Liu, Ji, Cai, Honglong, Chen, Warren, Sheng, XiangHai, Xue, Emily, Ozair, Sherjil, Yu, Adams, Angermueller, Christof, Li, Xiaowei, Wang, Weiren, Wiesinger, Julia, Koukoumidis, Emmanouil, Tian, Yuan, Iyer, Anand, Gurumurthy, Madhu, Goldenson, Mark, Shah, Parashar, Blake, MK, Yu, Hongkun, Urbanowicz, Anthony, Palomaki, Jennimaria, Fernando, Chrisantha, Brooks, Kevin, Durden, Ken, Mehta, Harsh, Momchev, Nikola, Rahimtoroghi, Elahe, Georgaki, Maria, Raul, Amit, Ruder, Sebastian, Redshaw, Morgan, Lee, Jinhyuk, Jalan, Komal, Li, Dinghua, Perng, Ginger, Hechtman, Blake, Schuh, Parker, Nasr, Milad, Chen, Mia, Milan, Kieran, Mikulik, Vladimir, Strohman, Trevor, Franco, Juliana, Green, Tim, Hassabis, Demis, Kavukcuoglu, Koray, Dean, Jeffrey, Vinyals, Oriol
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.