Riesa, Jason
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer
Goldman, Omer, Shaham, Uri, Malkin, Dan, Eiger, Sivan, Hassidim, Avinatan, Matias, Yossi, Maynez, Joshua, Gilady, Adi Mayrav, Riesa, Jason, Rijhwani, Shruti, Rimell, Laura, Szpektor, Idan, Tsarfaty, Reut, Eyal, Matan
To achieve equitable performance across languages, multilingual large language models (LLMs) must be able to abstract knowledge beyond the language in which it was acquired. However, the current literature lacks reliable ways to measure LLMs' capability of cross-lingual knowledge transfer. To that end, we present ECLeKTic, a multilingual closed-book QA (CBQA) dataset that Evaluates Cross-Lingual Knowledge Transfer in a simple, black-box manner. We detected information with uneven coverage across languages by controlling for presence and absence of Wikipedia articles in 12 languages. We generated knowledge-seeking questions in a source language, for which the answer appears in a relevant Wikipedia article and translated them to all other 11 languages, for which the respective Wikipedias lack equivalent articles. Assuming that Wikipedia reflects the prominent knowledge in the LLM's training data, to solve ECLeKTic's CBQA task the model is required to transfer knowledge between languages. Experimenting with 8 LLMs, we show that SOTA models struggle to effectively share knowledge across, languages even if they can predict the answer well for queries in the same language the knowledge was acquired in.
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Deutsch, Daniel, Briakou, Eleftheria, Caswell, Isaac, Finkelstein, Mara, Galor, Rebecca, Juraska, Juraj, Kovacs, Geza, Lui, Alison, Rei, Ricardo, Riesa, Jason, Rijhwani, Shruti, Riley, Parker, Salesky, Elizabeth, Trabelsi, Firas, Winkler, Stephanie, Zhang, Biao, Freitag, Markus
As large language models (LLM) become more and more capable in languages other than English, it is important to collect benchmark datasets in order to evaluate their multilingual performance, including on tasks like machine translation (MT). In this work, we extend the WMT24 dataset to cover 55 languages by collecting new human-written references and post-edits for 46 new languages and dialects in addition to post-edits of the references in 8 out of 9 languages in the original WMT24 dataset. The dataset covers four domains: literary, news, social, and speech. We benchmark a variety of MT providers and LLMs on the collected dataset using automatic metrics and find that LLMs are the best-performing MT systems in all 55 languages. These results should be confirmed using a human-based evaluation, which we leave for future work.
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini Team, null, Georgiev, Petko, Lei, Ving Ian, Burnell, Ryan, Bai, Libin, Gulati, Anmol, Tanzer, Garrett, Vincent, Damien, Pan, Zhufeng, Wang, Shibo, Mariooryad, Soroosh, Ding, Yifan, Geng, Xinyang, Alcober, Fred, Frostig, Roy, Omernick, Mark, Walker, Lexi, Paduraru, Cosmin, Sorokin, Christina, Tacchetti, Andrea, Gaffney, Colin, Daruki, Samira, Sercinoglu, Olcan, Gleicher, Zach, Love, Juliette, Voigtlaender, Paul, Jain, Rohan, Surita, Gabriela, Mohamed, Kareem, Blevins, Rory, Ahn, Junwhan, Zhu, Tao, Kawintiranon, Kornraphop, Firat, Orhan, Gu, Yiming, Zhang, Yujing, Rahtz, Matthew, Faruqui, Manaal, Clay, Natalie, Gilmer, Justin, Co-Reyes, JD, Penchev, Ivo, Zhu, Rui, Morioka, Nobuyuki, Hui, Kevin, Haridasan, Krishna, Campos, Victor, Mahdieh, Mahdis, Guo, Mandy, Hassan, Samer, Kilgour, Kevin, Vezer, Arpi, Cheng, Heng-Tze, de Liedekerke, Raoul, Goyal, Siddharth, Barham, Paul, Strouse, DJ, Noury, Seb, Adler, Jonas, Sundararajan, Mukund, Vikram, Sharad, Lepikhin, Dmitry, Paganini, Michela, Garcia, Xavier, Yang, Fan, Valter, Dasha, Trebacz, Maja, Vodrahalli, Kiran, Asawaroengchai, Chulayuth, Ring, Roman, Kalb, Norbert, Soares, Livio Baldini, Brahma, Siddhartha, Steiner, David, Yu, Tianhe, Mentzer, Fabian, He, Antoine, Gonzalez, Lucas, Xu, Bibo, Kaufman, Raphael Lopez, Shafey, Laurent El, Oh, Junhyuk, Hennigan, Tom, Driessche, George van den, Odoom, Seth, Lucic, Mario, Roelofs, Becca, Lall, Sid, Marathe, Amit, Chan, Betty, Ontanon, Santiago, He, Luheng, Teplyashin, Denis, Lai, Jonathan, Crone, Phil, Damoc, Bogdan, Ho, Lewis, Riedel, Sebastian, Lenc, Karel, Yeh, Chih-Kuan, Chowdhery, Aakanksha, Xu, Yang, Kazemi, Mehran, Amid, Ehsan, Petrushkina, Anastasia, Swersky, Kevin, Khodaei, Ali, Chen, Gowoon, Larkin, Chris, Pinto, Mario, Yan, Geng, Badia, Adria Puigdomenech, Patil, Piyush, Hansen, Steven, Orr, Dave, Arnold, Sebastien M. R., Grimstad, Jordan, Dai, Andrew, Douglas, Sholto, Sinha, Rishika, Yadav, Vikas, Chen, Xi, Gribovskaya, Elena, Austin, Jacob, Zhao, Jeffrey, Patel, Kaushal, Komarek, Paul, Austin, Sophia, Borgeaud, Sebastian, Friso, Linda, Goyal, Abhimanyu, Caine, Ben, Cao, Kris, Chung, Da-Woon, Lamm, Matthew, Barth-Maron, Gabe, Kagohara, Thais, Olszewska, Kate, Chen, Mia, Shivakumar, Kaushik, Agarwal, Rishabh, Godhia, Harshal, Rajwar, Ravi, Snaider, Javier, Dotiwalla, Xerxes, Liu, Yuan, Barua, Aditya, Ungureanu, Victor, Zhang, Yuan, Batsaikhan, Bat-Orgil, Wirth, Mateo, Qin, James, Danihelka, Ivo, Doshi, Tulsee, Chadwick, Martin, Chen, Jilin, Jain, Sanil, Le, Quoc, Kar, Arjun, Gurumurthy, Madhu, Li, Cheng, Sang, Ruoxin, Liu, Fangyu, Lamprou, Lampros, Munoz, Rich, Lintz, Nathan, Mehta, Harsh, Howard, Heidi, Reynolds, Malcolm, Aroyo, Lora, Wang, Quan, Blanco, Lorenzo, Cassirer, Albin, Griffith, Jordan, Das, Dipanjan, Lee, Stephan, Sygnowski, Jakub, Fisher, Zach, Besley, James, Powell, Richard, Ahmed, Zafarali, Paulus, Dominik, Reitter, David, Borsos, Zalan, Joshi, Rishabh, Pope, Aedan, Hand, Steven, Selo, Vittorio, Jain, Vihan, Sethi, Nikhil, Goel, Megha, Makino, Takaki, May, Rhys, Yang, Zhen, Schalkwyk, Johan, Butterfield, Christina, Hauth, Anja, Goldin, Alex, Hawkins, Will, Senter, Evan, Brin, Sergey, Woodman, Oliver, Ritter, Marvin, Noland, Eric, Giang, Minh, Bolina, Vijay, Lee, Lisa, Blyth, Tim, Mackinnon, Ian, Reid, Machel, Sarvana, Obaid, Silver, David, Chen, Alexander, Wang, Lily, Maggiore, Loren, Chang, Oscar, Attaluri, Nithya, Thornton, Gregory, Chiu, Chung-Cheng, Bunyan, Oskar, Levine, Nir, Chung, Timothy, Eltyshev, Evgenii, Si, Xiance, Lillicrap, Timothy, Brady, Demetra, Aggarwal, Vaibhav, Wu, Boxi, Xu, Yuanzhong, McIlroy, Ross, Badola, Kartikeya, Sandhu, Paramjit, Moreira, Erica, Stokowiec, Wojciech, Hemsley, Ross, Li, Dong, Tudor, Alex, Shyam, Pranav, Rahimtoroghi, Elahe, Haykal, Salem, Sprechmann, Pablo, Zhou, Xiang, Mincu, Diana, Li, Yujia, Addanki, Ravi, Krishna, Kalpesh, Wu, Xiao, Frechette, Alexandre, Eyal, Matan, Dafoe, Allan, Lacey, Dave, Whang, Jay, Avrahami, Thi, Zhang, Ye, Taropa, Emanuel, Lin, Hanzhao, Toyama, Daniel, Rutherford, Eliza, Sano, Motoki, Choe, HyunJeong, Tomala, Alex, Safranek-Shrader, Chalence, Kassner, Nora, Pajarskas, Mantas, Harvey, Matt, Sechrist, Sean, Fortunato, Meire, Lyu, Christina, Elsayed, Gamaleldin, Kuang, Chenkai, Lottes, James, Chu, Eric, Jia, Chao, Chen, Chih-Wei, Humphreys, Peter, Baumli, Kate, Tao, Connie, Samuel, Rajkumar, Santos, Cicero Nogueira dos, Andreassen, Anders, Rakićević, Nemanja, Grewe, Dominik, Kumar, Aviral, Winkler, Stephanie, Caton, Jonathan, Brock, Andrew, Dalmia, Sid, Sheahan, Hannah, Barr, Iain, Miao, Yingjie, Natsev, Paul, Devlin, Jacob, Behbahani, Feryal, Prost, Flavien, Sun, Yanhua, Myaskovsky, Artiom, Pillai, Thanumalayan Sankaranarayana, Hurt, Dan, Lazaridou, Angeliki, Xiong, Xi, Zheng, Ce, Pardo, Fabio, Li, Xiaowei, Horgan, Dan, Stanton, Joe, Ambar, Moran, Xia, Fei, Lince, Alejandro, Wang, Mingqiu, Mustafa, Basil, Webson, Albert, Lee, Hyo, Anil, Rohan, Wicke, Martin, Dozat, Timothy, Sinha, Abhishek, Piqueras, Enrique, Dabir, Elahe, Upadhyay, Shyam, Boral, Anudhyan, Hendricks, Lisa Anne, Fry, Corey, Djolonga, Josip, Su, Yi, Walker, Jake, Labanowski, Jane, Huang, Ronny, Misra, Vedant, Chen, Jeremy, Skerry-Ryan, RJ, Singh, Avi, Rijhwani, Shruti, Yu, Dian, Castro-Ros, Alex, Changpinyo, Beer, Datta, Romina, Bagri, Sumit, Hrafnkelsson, Arnar Mar, Maggioni, Marcello, Zheng, Daniel, Sulsky, Yury, Hou, Shaobo, Paine, Tom Le, Yang, Antoine, Riesa, Jason, Rogozinska, Dominika, Marcus, Dror, Badawy, Dalia El, Zhang, Qiao, Wang, Luyu, Miller, Helen, Greer, Jeremy, Sjos, Lars Lowe, Nova, Azade, Zen, Heiga, Chaabouni, Rahma, Rosca, Mihaela, Jiang, Jiepu, Chen, Charlie, Liu, Ruibo, Sainath, Tara, Krikun, Maxim, Polozov, Alex, Lespiau, Jean-Baptiste, Newlan, Josh, Cankara, Zeyncep, Kwak, Soo, Xu, Yunhan, Chen, Phil, Coenen, Andy, Meyer, Clemens, Tsihlas, Katerina, Ma, Ada, Gottweis, Juraj, Xing, Jinwei, Gu, Chenjie, Miao, Jin, Frank, Christian, Cankara, Zeynep, Ganapathy, Sanjay, Dasgupta, Ishita, Hughes-Fitt, Steph, Chen, Heng, Reid, David, Rong, Keran, Fan, Hongmin, van Amersfoort, Joost, Zhuang, Vincent, Cohen, Aaron, Gu, Shixiang Shane, Mohananey, Anhad, Ilic, Anastasija, Tobin, Taylor, Wieting, John, Bortsova, Anna, Thacker, Phoebe, Wang, Emma, Caveness, Emily, Chiu, Justin, Sezener, Eren, Kaskasoli, Alex, Baker, Steven, Millican, Katie, Elhawaty, Mohamed, Aisopos, Kostas, Lebsack, Carl, Byrd, Nathan, Dai, Hanjun, Jia, Wenhao, Wiethoff, Matthew, Davoodi, Elnaz, Weston, Albert, Yagati, Lakshman, Ahuja, Arun, Gao, Isabel, Pundak, Golan, Zhang, Susan, Azzam, Michael, Sim, Khe Chai, Caelles, Sergi, Keeling, James, Sharma, Abhanshu, Swing, Andy, Li, YaGuang, Liu, Chenxi, Bostock, Carrie Grimes, Bansal, Yamini, Nado, Zachary, Anand, Ankesh, Lipschultz, Josh, Karmarkar, Abhijit, Proleev, Lev, Ittycheriah, Abe, Yeganeh, Soheil Hassas, Polovets, George, Faust, Aleksandra, Sun, Jiao, Rrustemi, Alban, Li, Pen, Shivanna, Rakesh, Liu, Jeremiah, Welty, Chris, Lebron, Federico, Baddepudi, Anirudh, Krause, Sebastian, Parisotto, Emilio, Soricut, Radu, Xu, Zheng, Bloxwich, Dawn, Johnson, Melvin, Neyshabur, Behnam, Mao-Jones, Justin, Wang, Renshen, Ramasesh, Vinay, Abbas, Zaheer, Guez, Arthur, Segal, Constant, Nguyen, Duc Dung, Svensson, James, Hou, Le, York, Sarah, Milan, Kieran, Bridgers, Sophie, Gworek, Wiktor, Tagliasacchi, Marco, Lee-Thorp, James, Chang, Michael, Guseynov, Alexey, Hartman, Ale Jakse, Kwong, Michael, Zhao, Ruizhe, Kashem, Sheleem, Cole, Elizabeth, Miech, Antoine, Tanburn, Richard, Phuong, Mary, Pavetic, Filip, Cevey, Sebastien, Comanescu, Ramona, Ives, Richard, Yang, Sherry, Du, Cosmo, Li, Bo, Zhang, Zizhao, Iinuma, Mariko, Hu, Clara Huiyi, Roy, Aurko, Bijwadia, Shaan, Zhu, Zhenkai, Martins, Danilo, Saputro, Rachel, Gergely, Anita, Zheng, Steven, Jia, Dawei, Antonoglou, Ioannis, Sadovsky, Adam, Gu, Shane, Bi, Yingying, Andreev, Alek, Samangooei, Sina, Khan, Mina, Kocisky, Tomas, Filos, Angelos, Kumar, Chintu, Bishop, Colton, Yu, Adams, Hodkinson, Sarah, Mittal, Sid, Shah, Premal, Moufarek, Alexandre, Cheng, Yong, Bloniarz, Adam, Lee, Jaehoon, Pejman, Pedram, Michel, Paul, Spencer, Stephen, Feinberg, Vladimir, Xiong, Xuehan, Savinov, Nikolay, Smith, Charlotte, Shakeri, Siamak, Tran, Dustin, Chesus, Mary, Bohnet, Bernd, Tucker, George, von Glehn, Tamara, Muir, Carrie, Mao, Yiran, Kazawa, Hideto, Slone, Ambrose, Soparkar, Kedar, Shrivastava, Disha, Cobon-Kerr, James, Sharman, Michael, Pavagadhi, Jay, Araya, Carlos, Misiunas, Karolis, Ghelani, Nimesh, Laskin, Michael, Barker, David, Li, Qiujia, Briukhov, Anton, Houlsby, Neil, Glaese, Mia, Lakshminarayanan, Balaji, Schucher, Nathan, Tang, Yunhao, Collins, Eli, Lim, Hyeontaek, Feng, Fangxiaoyu, Recasens, Adria, Lai, Guangda, Magni, Alberto, De Cao, Nicola, Siddhant, Aditya, Ashwood, Zoe, Orbay, Jordi, Dehghani, Mostafa, Brennan, Jenny, He, Yifan, Xu, Kelvin, Gao, Yang, Saroufim, Carl, Molloy, James, Wu, Xinyi, Arnold, Seb, Chang, Solomon, Schrittwieser, Julian, Buchatskaya, Elena, Radpour, Soroush, Polacek, Martin, Giordano, Skye, Bapna, Ankur, Tokumine, Simon, Hellendoorn, Vincent, Sottiaux, Thibault, Cogan, Sarah, Severyn, Aliaksei, Saleh, Mohammad, Thakoor, Shantanu, Shefey, Laurent, Qiao, Siyuan, Gaba, Meenu, Chang, Shuo-yiin, Swanson, Craig, Zhang, Biao, Lee, Benjamin, Rubenstein, Paul Kishan, Song, Gan, Kwiatkowski, Tom, Koop, Anna, Kannan, Ajay, Kao, David, Schuh, Parker, Stjerngren, Axel, Ghiasi, Golnaz, Gibson, Gena, Vilnis, Luke, Yuan, Ye, Ferreira, Felipe Tiengo, Kamath, Aishwarya, Klimenko, Ted, Franko, Ken, Xiao, Kefan, Bhattacharya, Indro, Patel, Miteyan, Wang, Rui, Morris, Alex, Strudel, Robin, Sharma, Vivek, Choy, Peter, Hashemi, Sayed Hadi, Landon, Jessica, Finkelstein, Mara, Jhakra, Priya, Frye, Justin, Barnes, Megan, Mauger, Matthew, Daun, Dennis, Baatarsukh, Khuslen, Tung, Matthew, Farhan, Wael, Michalewski, Henryk, Viola, Fabio, Quitry, Felix de Chaumont, Lan, Charline Le, Hudson, Tom, Wang, Qingze, Fischer, Felix, Zheng, Ivy, White, Elspeth, Dragan, Anca, Alayrac, Jean-baptiste, Ni, Eric, Pritzel, Alexander, Iwanicki, Adam, Isard, Michael, Bulanova, Anna, Zilka, Lukas, Dyer, Ethan, Sachan, Devendra, Srinivasan, Srivatsan, Muckenhirn, Hannah, Cai, Honglong, Mandhane, Amol, Tariq, Mukarram, Rae, Jack W., Wang, Gary, Ayoub, Kareem, FitzGerald, Nicholas, Zhao, Yao, Han, Woohyun, Alberti, Chris, Garrette, Dan, Krishnakumar, Kashyap, Gimenez, Mai, Levskaya, Anselm, Sohn, Daniel, Matak, Josip, Iturrate, Inaki, Chang, Michael B., Xiang, Jackie, Cao, Yuan, Ranka, Nishant, Brown, Geoff, Hutter, Adrian, Mirrokni, Vahab, Chen, Nanxin, Yao, Kaisheng, Egyed, Zoltan, Galilee, Francois, Liechty, Tyler, Kallakuri, Praveen, Palmer, Evan, Ghemawat, Sanjay, Liu, Jasmine, Tao, David, Thornton, Chloe, Green, Tim, Jasarevic, Mimi, Lin, Sharon, Cotruta, Victor, Tan, Yi-Xuan, Fiedel, Noah, Yu, Hongkun, Chi, Ed, Neitz, Alexander, Heitkaemper, Jens, Sinha, Anu, Zhou, Denny, Sun, Yi, Kaed, Charbel, Hulse, Brice, Mishra, Swaroop, Georgaki, Maria, Kudugunta, Sneha, Farabet, Clement, Shafran, Izhak, Vlasic, Daniel, Tsitsulin, Anton, Ananthanarayanan, Rajagopal, Carin, Alen, Su, Guolong, Sun, Pei, V, Shashank, Carvajal, Gabriel, Broder, Josef, Comsa, Iulia, Repina, Alena, Wong, William, Chen, Warren Weilun, Hawkins, Peter, Filonov, Egor, Loher, Lucia, Hirnschall, Christoph, Wang, Weiyi, Ye, Jingchen, Burns, Andrea, Cate, Hardie, Wright, Diana Gage, Piccinini, Federico, Zhang, Lei, Lin, Chu-Cheng, Gog, Ionel, Kulizhskaya, Yana, Sreevatsa, Ashwin, Song, Shuang, Cobo, Luis C., Iyer, Anand, Tekur, Chetan, Garrido, Guillermo, Xiao, Zhuyun, Kemp, Rupert, Zheng, Huaixiu Steven, Li, Hui, Agarwal, Ananth, Ngani, Christel, Goshvadi, Kati, Santamaria-Fernandez, Rebeca, Fica, Wojciech, Chen, Xinyun, Gorgolewski, Chris, Sun, Sean, Garg, Roopal, Ye, Xinyu, Eslami, S. M. Ali, Hua, Nan, Simon, Jon, Joshi, Pratik, Kim, Yelin, Tenney, Ian, Potluri, Sahitya, Thiet, Lam Nguyen, Yuan, Quan, Luisier, Florian, Chronopoulou, Alexandra, Scellato, Salvatore, Srinivasan, Praveen, Chen, Minmin, Koverkathu, Vinod, Dalibard, Valentin, Xu, Yaming, Saeta, Brennan, Anderson, Keith, Sellam, Thibault, Fernando, Nick, Huot, Fantine, Jung, Junehyuk, Varadarajan, Mani, Quinn, Michael, Raul, Amit, Le, Maigo, Habalov, Ruslan, Clark, Jon, Jalan, Komal, Bullard, Kalesha, Singhal, Achintya, Luong, Thang, Wang, Boyu, Rajayogam, Sujeevan, Eisenschlos, Julian, Jia, Johnson, Finchelstein, Daniel, Yakubovich, Alex, Balle, Daniel, Fink, Michael, Agarwal, Sameer, Li, Jing, Dvijotham, Dj, Pal, Shalini, Kang, Kai, Konzelmann, Jaclyn, Beattie, Jennifer, Dousse, Olivier, Wu, Diane, Crocker, Remi, Elkind, Chen, Jonnalagadda, Siddhartha Reddy, Lee, Jong, Holtmann-Rice, Dan, Kallarackal, Krystal, Liu, Rosanne, Vnukov, Denis, Vats, Neera, Invernizzi, Luca, Jafari, Mohsen, Zhou, Huanjie, Taylor, Lilly, Prendki, Jennifer, Wu, Marcus, Eccles, Tom, Liu, Tianqi, Kopparapu, Kavya, Beaufays, Francoise, Angermueller, Christof, Marzoca, Andreea, Sarcar, Shourya, Dib, Hilal, Stanway, Jeff, Perbet, Frank, Trdin, Nejc, Sterneck, Rachel, Khorlin, Andrey, Li, Dinghua, Wu, Xihui, Goenka, Sonam, Madras, David, Goldshtein, Sasha, Gierke, Willi, Zhou, Tong, Liu, Yaxin, Liang, Yannie, White, Anais, Li, Yunjie, Singh, Shreya, Bahargam, Sanaz, Epstein, Mark, Basu, Sujoy, Lao, Li, Ozturel, Adnan, Crous, Carl, Zhai, Alex, Lu, Han, Tung, Zora, Gaur, Neeraj, Walton, Alanna, Dixon, Lucas, Zhang, Ming, Globerson, Amir, Uy, Grant, Bolt, Andrew, Wiles, Olivia, Nasr, Milad, Shumailov, Ilia, Selvi, Marco, Piccinno, Francesco, Aguilar, Ricardo, McCarthy, Sara, Khalman, Misha, Shukla, Mrinal, Galic, Vlado, Carpenter, John, Villela, Kevin, Zhang, Haibin, Richardson, Harry, Martens, James, Bosnjak, Matko, Belle, Shreyas Rammohan, Seibert, Jeff, Alnahlawi, Mahmoud, McWilliams, Brian, Singh, Sankalp, Louis, Annie, Ding, Wen, Popovici, Dan, Simicich, Lenin, Knight, Laura, Mehta, Pulkit, Gupta, Nishesh, Shi, Chongyang, Fatehi, Saaber, Mitrovic, Jovana, Grills, Alex, Pagadora, Joseph, Petrova, Dessie, Eisenbud, Danielle, Zhang, Zhishuai, Yates, Damion, Mittal, Bhavishya, Tripuraneni, Nilesh, Assael, Yannis, Brovelli, Thomas, Jain, Prateek, Velimirovic, Mihajlo, Akbulut, Canfer, Mu, Jiaqi, Macherey, Wolfgang, Kumar, Ravin, Xu, Jun, Qureshi, Haroon, Comanici, Gheorghe, Wiesner, Jeremy, Gong, Zhitao, Ruddock, Anton, Bauer, Matthias, Felt, Nick, GP, Anirudh, Arnab, Anurag, Zelle, Dustin, Rothfuss, Jonas, Rosgen, Bill, Shenoy, Ashish, Seybold, Bryan, Li, Xinjian, Mudigonda, Jayaram, Erdogan, Goker, Xia, Jiawei, Simsa, Jiri, Michi, Andrea, Yao, Yi, Yew, Christopher, Kan, Steven, Caswell, Isaac, Radebaugh, Carey, Elisseeff, Andre, Valenzuela, Pedro, McKinney, Kay, Paterson, Kim, Cui, Albert, Latorre-Chimoto, Eri, Kim, Solomon, Zeng, William, Durden, Ken, Ponnapalli, Priya, Sosea, Tiberiu, Choquette-Choo, Christopher A., Manyika, James, Robenek, Brona, Vashisht, Harsha, Pereira, Sebastien, Lam, Hoi, Velic, Marko, Owusu-Afriyie, Denese, Lee, Katherine, Bolukbasi, Tolga, Parrish, Alicia, Lu, Shawn, Park, Jane, Venkatraman, Balaji, Talbert, Alice, Rosique, Lambert, Cheng, Yuchung, Sozanschi, Andrei, Paszke, Adam, Kumar, Praveen, Austin, Jessica, Li, Lu, Salama, Khalid, Kim, Wooyeol, Dukkipati, Nandita, Baryshnikov, Anthony, Kaplanis, Christos, Sheng, XiangHai, Chervonyi, Yuri, Unlu, Caglar, Casas, Diego de Las, Askham, Harry, Tunyasuvunakool, Kathryn, Gimeno, Felix, Poder, Siim, Kwak, Chester, Miecnikowski, Matt, Mirrokni, Vahab, Dimitriev, Alek, Parisi, Aaron, Liu, Dangyi, Tsai, Tomy, Shevlane, Toby, Kouridi, Christina, Garmon, Drew, Goedeckemeyer, Adrian, Brown, Adam R., Vijayakumar, Anitha, Elqursh, Ali, Jazayeri, Sadegh, Huang, Jin, Carthy, Sara Mc, Hoover, Jay, Kim, Lucy, Kumar, Sandeep, Chen, Wei, Biles, Courtney, Bingham, Garrett, Rosen, Evan, Wang, Lisa, Tan, Qijun, Engel, David, Pongetti, Francesco, de Cesare, Dario, Hwang, Dongseong, Yu, Lily, Pullman, Jennifer, Narayanan, Srini, Levin, Kyle, Gopal, Siddharth, Li, Megan, Aharoni, Asaf, Trinh, Trieu, Lo, Jessica, Casagrande, Norman, Vij, Roopali, Matthey, Loic, Ramadhana, Bramandia, Matthews, Austin, Carey, CJ, Johnson, Matthew, Goranova, Kremena, Shah, Rohin, Ashraf, Shereen, Dasgupta, Kingshuk, Larsen, Rasmus, Wang, Yicheng, Vuyyuru, Manish Reddy, Jiang, Chong, Ijazi, Joana, Osawa, Kazuki, Smith, Celine, Boppana, Ramya Sree, Bilal, Taylan, Koizumi, Yuma, Xu, Ying, Altun, Yasemin, Shabat, Nir, Bariach, Ben, Korchemniy, Alex, Choo, Kiam, Ronneberger, Olaf, Iwuanyanwu, Chimezie, Zhao, Shubin, Soergel, David, Hsieh, Cho-Jui, Cai, Irene, Iqbal, Shariq, Sundermeyer, Martin, Chen, Zhe, Bursztein, Elie, Malaviya, Chaitanya, Biadsy, Fadi, Shroff, Prakash, Dhillon, Inderjit, Latkar, Tejasi, Dyer, Chris, Forbes, Hannah, Nicosia, Massimo, Nikolaev, Vitaly, Greene, Somer, Georgiev, Marin, Wang, Pidong, Martin, Nina, Sedghi, Hanie, Zhang, John, Banzal, Praseem, Fritz, Doug, Rao, Vikram, Wang, Xuezhi, Zhang, Jiageng, Patraucean, Viorica, Du, Dayou, Mordatch, Igor, Jurin, Ivan, Liu, Lewis, Dubey, Ayush, Mohan, Abhi, Nowakowski, Janek, Ion, Vlad-Doru, Wei, Nan, Tojo, Reiko, Raad, Maria Abi, Hudson, Drew A., Keshava, Vaishakh, Agrawal, Shubham, Ramirez, Kevin, Wu, Zhichun, Nguyen, Hoang, Liu, Ji, Sewak, Madhavi, Petrini, Bryce, Choi, DongHyun, Philips, Ivan, Wang, Ziyue, Bica, Ioana, Garg, Ankush, Wilkiewicz, Jarek, Agrawal, Priyanka, Li, Xiaowei, Guo, Danhao, Xue, Emily, Shaik, Naseer, Leach, Andrew, Khan, Sadh MNM, Wiesinger, Julia, Jerome, Sammy, Chakladar, Abhishek, Wang, Alek Wenjiao, Ornduff, Tina, Abu, Folake, Ghaffarkhah, Alireza, Wainwright, Marcus, Cortes, Mario, Liu, Frederick, Maynez, Joshua, Petrov, Slav, Wu, Yonghui, Hassabis, Demis, Kavukcuoglu, Koray, Dean, Jeffrey, Vinyals, Oriol
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, null, Anil, Rohan, Borgeaud, Sebastian, Wu, Yonghui, Alayrac, Jean-Baptiste, Yu, Jiahui, Soricut, Radu, Schalkwyk, Johan, Dai, Andrew M., Hauth, Anja, Millican, Katie, Silver, David, Petrov, Slav, Johnson, Melvin, Antonoglou, Ioannis, Schrittwieser, Julian, Glaese, Amelia, Chen, Jilin, Pitler, Emily, Lillicrap, Timothy, Lazaridou, Angeliki, Firat, Orhan, Molloy, James, Isard, Michael, Barham, Paul R., Hennigan, Tom, Lee, Benjamin, Viola, Fabio, Reynolds, Malcolm, Xu, Yuanzhong, Doherty, Ryan, Collins, Eli, Meyer, Clemens, Rutherford, Eliza, Moreira, Erica, Ayoub, Kareem, Goel, Megha, Tucker, George, Piqueras, Enrique, Krikun, Maxim, Barr, Iain, Savinov, Nikolay, Danihelka, Ivo, Roelofs, Becca, White, Anaïs, Andreassen, Anders, von Glehn, Tamara, Yagati, Lakshman, Kazemi, Mehran, Gonzalez, Lucas, Khalman, Misha, Sygnowski, Jakub, Frechette, Alexandre, Smith, Charlotte, Culp, Laura, Proleev, Lev, Luan, Yi, Chen, Xi, Lottes, James, Schucher, Nathan, Lebron, Federico, Rrustemi, Alban, Clay, Natalie, Crone, Phil, Kocisky, Tomas, Zhao, Jeffrey, Perz, Bartek, Yu, Dian, Howard, Heidi, Bloniarz, Adam, Rae, Jack W., Lu, Han, Sifre, Laurent, Maggioni, Marcello, Alcober, Fred, Garrette, Dan, Barnes, Megan, Thakoor, Shantanu, Austin, Jacob, Barth-Maron, Gabriel, Wong, William, Joshi, Rishabh, Chaabouni, Rahma, Fatiha, Deeni, Ahuja, Arun, Liu, Ruibo, Li, Yunxuan, Cogan, Sarah, Chen, Jeremy, Jia, Chao, Gu, Chenjie, Zhang, Qiao, Grimstad, Jordan, Hartman, Ale Jakse, Chadwick, Martin, Tomar, Gaurav Singh, Garcia, Xavier, Senter, Evan, Taropa, Emanuel, Pillai, Thanumalayan Sankaranarayana, Devlin, Jacob, Laskin, Michael, Casas, Diego de Las, Valter, Dasha, Tao, Connie, Blanco, Lorenzo, Badia, Adrià Puigdomènech, Reitter, David, Chen, Mianna, Brennan, Jenny, Rivera, Clara, Brin, Sergey, Iqbal, Shariq, Surita, Gabriela, Labanowski, Jane, Rao, Abhi, Winkler, Stephanie, Parisotto, Emilio, Gu, Yiming, Olszewska, Kate, Zhang, Yujing, Addanki, Ravi, Miech, Antoine, Louis, Annie, Shafey, Laurent El, Teplyashin, Denis, Brown, Geoff, Catt, Elliot, Attaluri, Nithya, Balaguer, Jan, Xiang, Jackie, Wang, Pidong, Ashwood, Zoe, Briukhov, Anton, Webson, Albert, Ganapathy, Sanjay, Sanghavi, Smit, Kannan, Ajay, Chang, Ming-Wei, Stjerngren, Axel, Djolonga, Josip, Sun, Yuting, Bapna, Ankur, Aitchison, Matthew, Pejman, Pedram, Michalewski, Henryk, Yu, Tianhe, Wang, Cindy, Love, Juliette, Ahn, Junwhan, Bloxwich, Dawn, Han, Kehang, Humphreys, Peter, Sellam, Thibault, Bradbury, James, Godbole, Varun, Samangooei, Sina, Damoc, Bogdan, Kaskasoli, Alex, Arnold, Sébastien M. R., Vasudevan, Vijay, Agrawal, Shubham, Riesa, Jason, Lepikhin, Dmitry, Tanburn, Richard, Srinivasan, Srivatsan, Lim, Hyeontaek, Hodkinson, Sarah, Shyam, Pranav, Ferret, Johan, Hand, Steven, Garg, Ankush, Paine, Tom Le, Li, Jian, Li, Yujia, Giang, Minh, Neitz, Alexander, Abbas, Zaheer, York, Sarah, Reid, Machel, Cole, Elizabeth, Chowdhery, Aakanksha, Das, Dipanjan, Rogozińska, Dominika, Nikolaev, Vitaly, Sprechmann, Pablo, Nado, Zachary, Zilka, Lukas, Prost, Flavien, He, Luheng, Monteiro, Marianne, Mishra, Gaurav, Welty, Chris, Newlan, Josh, Jia, Dawei, Allamanis, Miltiadis, Hu, Clara Huiyi, de Liedekerke, Raoul, Gilmer, Justin, Saroufim, Carl, Rijhwani, Shruti, Hou, Shaobo, Shrivastava, Disha, Baddepudi, Anirudh, Goldin, Alex, Ozturel, Adnan, Cassirer, Albin, Xu, Yunhan, Sohn, Daniel, Sachan, Devendra, Amplayo, Reinald Kim, Swanson, Craig, Petrova, Dessie, Narayan, Shashi, Guez, Arthur, Brahma, Siddhartha, Landon, Jessica, Patel, Miteyan, Zhao, Ruizhe, Villela, Kevin, Wang, Luyu, Jia, Wenhao, Rahtz, Matthew, Giménez, Mai, Yeung, Legg, Lin, Hanzhao, Keeling, James, Georgiev, Petko, Mincu, Diana, Wu, Boxi, Haykal, Salem, Saputro, Rachel, Vodrahalli, Kiran, Qin, James, Cankara, Zeynep, Sharma, Abhanshu, Fernando, Nick, Hawkins, Will, Neyshabur, Behnam, Kim, Solomon, Hutter, Adrian, Agrawal, Priyanka, Castro-Ros, Alex, Driessche, George van den, Wang, Tao, Yang, Fan, Chang, Shuo-yiin, Komarek, Paul, McIlroy, Ross, Lučić, Mario, Zhang, Guodong, Farhan, Wael, Sharman, Michael, Natsev, Paul, Michel, Paul, Cheng, Yong, Bansal, Yamini, Qiao, Siyuan, Cao, Kris, Shakeri, Siamak, Butterfield, Christina, Chung, Justin, Rubenstein, Paul Kishan, Agrawal, Shivani, Mensch, Arthur, Soparkar, Kedar, Lenc, Karel, Chung, Timothy, Pope, Aedan, Maggiore, Loren, Kay, Jackie, Jhakra, Priya, Wang, Shibo, Maynez, Joshua, Phuong, Mary, Tobin, Taylor, Tacchetti, Andrea, Trebacz, Maja, Robinson, Kevin, Katariya, Yash, Riedel, Sebastian, Bailey, Paige, Xiao, Kefan, Ghelani, Nimesh, Aroyo, Lora, Slone, Ambrose, Houlsby, Neil, Xiong, Xuehan, Yang, Zhen, Gribovskaya, Elena, Adler, Jonas, Wirth, Mateo, Lee, Lisa, Li, Music, Kagohara, Thais, Pavagadhi, Jay, Bridgers, Sophie, Bortsova, Anna, Ghemawat, Sanjay, Ahmed, Zafarali, Liu, Tianqi, Powell, Richard, Bolina, Vijay, Iinuma, Mariko, Zablotskaia, Polina, Besley, James, Chung, Da-Woon, Dozat, Timothy, Comanescu, Ramona, Si, Xiance, Greer, Jeremy, Su, Guolong, Polacek, Martin, Kaufman, Raphaël Lopez, Tokumine, Simon, Hu, Hexiang, Buchatskaya, Elena, Miao, Yingjie, Elhawaty, Mohamed, Siddhant, Aditya, Tomasev, Nenad, Xing, Jinwei, Greer, Christina, Miller, Helen, Ashraf, Shereen, Roy, Aurko, Zhang, Zizhao, Ma, Ada, Filos, Angelos, Besta, Milos, Blevins, Rory, Klimenko, Ted, Yeh, Chih-Kuan, Changpinyo, Soravit, Mu, Jiaqi, Chang, Oscar, Pajarskas, Mantas, Muir, Carrie, Cohen, Vered, Lan, Charline Le, Haridasan, Krishna, Marathe, Amit, Hansen, Steven, Douglas, Sholto, Samuel, Rajkumar, Wang, Mingqiu, Austin, Sophia, Lan, Chang, Jiang, Jiepu, Chiu, Justin, Lorenzo, Jaime Alonso, Sjösund, Lars Lowe, Cevey, Sébastien, Gleicher, Zach, Avrahami, Thi, Boral, Anudhyan, Srinivasan, Hansa, Selo, Vittorio, May, Rhys, Aisopos, Konstantinos, Hussenot, Léonard, Soares, Livio Baldini, Baumli, Kate, Chang, Michael B., Recasens, Adrià, Caine, Ben, Pritzel, Alexander, Pavetic, Filip, Pardo, Fabio, Gergely, Anita, Frye, Justin, Ramasesh, Vinay, Horgan, Dan, Badola, Kartikeya, Kassner, Nora, Roy, Subhrajit, Dyer, Ethan, Campos, Víctor, Tomala, Alex, Tang, Yunhao, Badawy, Dalia El, White, Elspeth, Mustafa, Basil, Lang, Oran, Jindal, Abhishek, Vikram, Sharad, Gong, Zhitao, Caelles, Sergi, Hemsley, Ross, Thornton, Gregory, Feng, Fangxiaoyu, Stokowiec, Wojciech, Zheng, Ce, Thacker, Phoebe, Ünlü, Çağlar, Zhang, Zhishuai, Saleh, Mohammad, Svensson, James, Bileschi, Max, Patil, Piyush, Anand, Ankesh, Ring, Roman, Tsihlas, Katerina, Vezer, Arpi, Selvi, Marco, Shevlane, Toby, Rodriguez, Mikel, Kwiatkowski, Tom, Daruki, Samira, Rong, Keran, Dafoe, Allan, FitzGerald, Nicholas, Gu-Lemberg, Keren, Khan, Mina, Hendricks, Lisa Anne, Pellat, Marie, Feinberg, Vladimir, Cobon-Kerr, James, Sainath, Tara, Rauh, Maribeth, Hashemi, Sayed Hadi, Ives, Richard, Hasson, Yana, Li, YaGuang, Noland, Eric, Cao, Yuan, Byrd, Nathan, Hou, Le, Wang, Qingze, Sottiaux, Thibault, Paganini, Michela, Lespiau, Jean-Baptiste, Moufarek, Alexandre, Hassan, Samer, Shivakumar, Kaushik, van Amersfoort, Joost, Mandhane, Amol, Joshi, Pratik, Goyal, Anirudh, Tung, Matthew, Brock, Andrew, Sheahan, Hannah, Misra, Vedant, Li, Cheng, Rakićević, Nemanja, Dehghani, Mostafa, Liu, Fangyu, Mittal, Sid, Oh, Junhyuk, Noury, Seb, Sezener, Eren, Huot, Fantine, Lamm, Matthew, De Cao, Nicola, Chen, Charlie, Elsayed, Gamaleldin, Chi, Ed, Mahdieh, Mahdis, Tenney, Ian, Hua, Nan, Petrychenko, Ivan, Kane, Patrick, Scandinaro, Dylan, Jain, Rishub, Uesato, Jonathan, Datta, Romina, Sadovsky, Adam, Bunyan, Oskar, Rabiej, Dominik, Wu, Shimu, Zhang, John, Vasudevan, Gautam, Leurent, Edouard, Alnahlawi, Mahmoud, Georgescu, Ionut, Wei, Nan, Zheng, Ivy, Chan, Betty, Rabinovitch, Pam G, Stanczyk, Piotr, Zhang, Ye, Steiner, David, Naskar, Subhajit, Azzam, Michael, Johnson, Matthew, Paszke, Adam, Chiu, Chung-Cheng, Elias, Jaume Sanchez, Mohiuddin, Afroz, Muhammad, Faizan, Miao, Jin, Lee, Andrew, Vieillard, Nino, Potluri, Sahitya, Park, Jane, Davoodi, Elnaz, Zhang, Jiageng, Stanway, Jeff, Garmon, Drew, Karmarkar, Abhijit, Dong, Zhe, Lee, Jong, Kumar, Aviral, Zhou, Luowei, Evens, Jonathan, Isaac, William, Chen, Zhe, Jia, Johnson, Levskaya, Anselm, Zhu, Zhenkai, Gorgolewski, Chris, Grabowski, Peter, Mao, Yu, Magni, Alberto, Yao, Kaisheng, Snaider, Javier, Casagrande, Norman, Suganthan, Paul, Palmer, Evan, Irving, Geoffrey, Loper, Edward, Faruqui, Manaal, Arkatkar, Isha, Chen, Nanxin, Shafran, Izhak, Fink, Michael, Castaño, Alfonso, Giannoumis, Irene, Kim, Wooyeol, Rybiński, Mikołaj, Sreevatsa, Ashwin, Prendki, Jennifer, Soergel, David, Goedeckemeyer, Adrian, Gierke, Willi, Jafari, Mohsen, Gaba, Meenu, Wiesner, Jeremy, Wright, Diana Gage, Wei, Yawen, Vashisht, Harsha, Kulizhskaya, Yana, Hoover, Jay, Le, Maigo, Li, Lu, Iwuanyanwu, Chimezie, Liu, Lu, Ramirez, Kevin, Khorlin, Andrey, Cui, Albert, LIN, Tian, Georgiev, Marin, Wu, Marcus, Aguilar, Ricardo, Pallo, Keith, Chakladar, Abhishek, Repina, Alena, Wu, Xihui, van der Weide, Tom, Ponnapalli, Priya, Kaplan, Caroline, Simsa, Jiri, Li, Shuangfeng, Dousse, Olivier, Yang, Fan, Piper, Jeff, Ie, Nathan, Lui, Minnie, Pasumarthi, Rama, Lintz, Nathan, Vijayakumar, Anitha, Thiet, Lam Nguyen, Andor, Daniel, Valenzuela, Pedro, Paduraru, Cosmin, Peng, Daiyi, Lee, Katherine, Zhang, Shuyuan, Greene, Somer, Nguyen, Duc Dung, Kurylowicz, Paula, Velury, Sarmishta, Krause, Sebastian, Hardin, Cassidy, Dixon, Lucas, Janzer, Lili, Choo, Kiam, Feng, Ziqiang, Zhang, Biao, Singhal, Achintya, Latkar, Tejasi, Zhang, Mingyang, Le, Quoc, Abellan, Elena Allica, Du, Dayou, McKinnon, Dan, Antropova, Natasha, Bolukbasi, Tolga, Keller, Orgad, Reid, David, Finchelstein, Daniel, Raad, Maria Abi, Crocker, Remi, Hawkins, Peter, Dadashi, Robert, Gaffney, Colin, Lall, Sid, Franko, Ken, Filonov, Egor, Bulanova, Anna, Leblond, Rémi, Yadav, Vikas, Chung, Shirley, Askham, Harry, Cobo, Luis C., Xu, Kelvin, Fischer, Felix, Xu, Jun, Sorokin, Christina, Alberti, Chris, Lin, Chu-Cheng, Evans, Colin, Zhou, Hao, Dimitriev, Alek, Forbes, Hannah, Banarse, Dylan, Tung, Zora, Liu, Jeremiah, Omernick, Mark, Bishop, Colton, Kumar, Chintu, Sterneck, Rachel, Foley, Ryan, Jain, Rohan, Mishra, Swaroop, Xia, Jiawei, Bos, Taylor, Cideron, Geoffrey, Amid, Ehsan, Piccinno, Francesco, Wang, Xingyu, Banzal, Praseem, Gurita, Petru, Noga, Hila, Shah, Premal, Mankowitz, Daniel J., Polozov, Alex, Kushman, Nate, Krakovna, Victoria, Brown, Sasha, Bateni, MohammadHossein, Duan, Dennis, Firoiu, Vlad, Thotakuri, Meghana, Natan, Tom, Mohananey, Anhad, Geist, Matthieu, Mudgal, Sidharth, Girgin, Sertan, Li, Hui, Ye, Jiayu, Roval, Ofir, Tojo, Reiko, Kwong, Michael, Lee-Thorp, James, Yew, Christopher, Yuan, Quan, Bagri, Sumit, Sinopalnikov, Danila, Ramos, Sabela, Mellor, John, Sharma, Abhishek, Severyn, Aliaksei, Lai, Jonathan, Wu, Kathy, Cheng, Heng-Tze, Miller, David, Sonnerat, Nicolas, Vnukov, Denis, Greig, Rory, Beattie, Jennifer, Caveness, Emily, Bai, Libin, Eisenschlos, Julian, Korchemniy, Alex, Tsai, Tomy, Jasarevic, Mimi, Kong, Weize, Dao, Phuong, Zheng, Zeyu, Liu, Frederick, Yang, Fan, Zhu, Rui, Geller, Mark, Teh, Tian Huey, Sanmiya, Jason, Gladchenko, Evgeny, Trdin, Nejc, Sozanschi, Andrei, Toyama, Daniel, Rosen, Evan, Tavakkol, Sasan, Xue, Linting, Elkind, Chen, Woodman, Oliver, Carpenter, John, Papamakarios, George, Kemp, Rupert, Kafle, Sushant, Grunina, Tanya, Sinha, Rishika, Talbert, Alice, Goyal, Abhimanyu, Wu, Diane, Owusu-Afriyie, Denese, Du, Cosmo, Thornton, Chloe, Pont-Tuset, Jordi, Narayana, Pradyumna, Li, Jing, Fatehi, Sabaer, Wieting, John, Ajmeri, Omar, Uria, Benigno, Zhu, Tao, Ko, Yeongil, Knight, Laura, Héliou, Amélie, Niu, Ning, Gu, Shane, Pang, Chenxi, Tran, Dustin, Li, Yeqing, Levine, Nir, Stolovich, Ariel, Kalb, Norbert, Santamaria-Fernandez, Rebeca, Goenka, Sonam, Yustalim, Wenny, Strudel, Robin, Elqursh, Ali, Lakshminarayanan, Balaji, Deck, Charlie, Upadhyay, Shyam, Lee, Hyo, Dusenberry, Mike, Li, Zonglin, Wang, Xuezhi, Levin, Kyle, Hoffmann, Raphael, Holtmann-Rice, Dan, Bachem, Olivier, Yue, Summer, Arora, Sho, Malmi, Eric, Mirylenka, Daniil, Tan, Qijun, Koh, Christy, Yeganeh, Soheil Hassas, Põder, Siim, Zheng, Steven, Pongetti, Francesco, Tariq, Mukarram, Sun, Yanhua, Ionita, Lucian, Seyedhosseini, Mojtaba, Tafti, Pouya, Kotikalapudi, Ragha, Liu, Zhiyu, Gulati, Anmol, Liu, Jasmine, Ye, Xinyu, Chrzaszcz, Bart, Wang, Lily, Sethi, Nikhil, Li, Tianrun, Brown, Ben, Singh, Shreya, Fan, Wei, Parisi, Aaron, Stanton, Joe, Kuang, Chenkai, Koverkathu, Vinod, Choquette-Choo, Christopher A., Li, Yunjie, Lu, TJ, Ittycheriah, Abe, Shroff, Prakash, Sun, Pei, Varadarajan, Mani, Bahargam, Sanaz, Willoughby, Rob, Gaddy, David, Dasgupta, Ishita, Desjardins, Guillaume, Cornero, Marco, Robenek, Brona, Mittal, Bhavishya, Albrecht, Ben, Shenoy, Ashish, Moiseev, Fedor, Jacobsson, Henrik, Ghaffarkhah, Alireza, Rivière, Morgane, Walton, Alanna, Crepy, Clément, Parrish, Alicia, Liu, Yuan, Zhou, Zongwei, Farabet, Clement, Radebaugh, Carey, Srinivasan, Praveen, van der Salm, Claudia, Fidjeland, Andreas, Scellato, Salvatore, Latorre-Chimoto, Eri, Klimczak-Plucińska, Hanna, Bridson, David, de Cesare, Dario, Hudson, Tom, Mendolicchio, Piermaria, Walker, Lexi, Morris, Alex, Penchev, Ivo, Mauger, Matthew, Guseynov, Alexey, Reid, Alison, Odoom, Seth, Loher, Lucia, Cotruta, Victor, Yenugula, Madhavi, Grewe, Dominik, Petrushkina, Anastasia, Duerig, Tom, Sanchez, Antonio, Yadlowsky, Steve, Shen, Amy, Globerson, Amir, Kurzrok, Adam, Webb, Lynette, Dua, Sahil, Li, Dong, Lahoti, Preethi, Bhupatiraju, Surya, Hurt, Dan, Qureshi, Haroon, Agarwal, Ananth, Shani, Tomer, Eyal, Matan, Khare, Anuj, Belle, Shreyas Rammohan, Wang, Lei, Tekur, Chetan, Kale, Mihir Sanjay, Wei, Jinliang, Sang, Ruoxin, Saeta, Brennan, Liechty, Tyler, Sun, Yi, Zhao, Yao, Lee, Stephan, Nayak, Pandu, Fritz, Doug, Vuyyuru, Manish Reddy, Aslanides, John, Vyas, Nidhi, Wicke, Martin, Ma, Xiao, Bilal, Taylan, Eltyshev, Evgenii, Balle, Daniel, Martin, Nina, Cate, Hardie, Manyika, James, Amiri, Keyvan, Kim, Yelin, Xiong, Xi, Kang, Kai, Luisier, Florian, Tripuraneni, Nilesh, Madras, David, Guo, Mandy, Waters, Austin, Wang, Oliver, Ainslie, Joshua, Baldridge, Jason, Zhang, Han, Pruthi, Garima, Bauer, Jakob, Yang, Feng, Mansour, Riham, Gelman, Jason, Xu, Yang, Polovets, George, Liu, Ji, Cai, Honglong, Chen, Warren, Sheng, XiangHai, Xue, Emily, Ozair, Sherjil, Yu, Adams, Angermueller, Christof, Li, Xiaowei, Wang, Weiren, Wiesinger, Julia, Koukoumidis, Emmanouil, Tian, Yuan, Iyer, Anand, Gurumurthy, Madhu, Goldenson, Mark, Shah, Parashar, Blake, MK, Yu, Hongkun, Urbanowicz, Anthony, Palomaki, Jennimaria, Fernando, Chrisantha, Brooks, Kevin, Durden, Ken, Mehta, Harsh, Momchev, Nikola, Rahimtoroghi, Elahe, Georgaki, Maria, Raul, Amit, Ruder, Sebastian, Redshaw, Morgan, Lee, Jinhyuk, Jalan, Komal, Li, Dinghua, Perng, Ginger, Hechtman, Blake, Schuh, Parker, Nasr, Milad, Chen, Mia, Milan, Kieran, Mikulik, Vladimir, Strohman, Trevor, Franco, Juliana, Green, Tim, Hassabis, Demis, Kavukcuoglu, Koray, Dean, Jeffrey, Vinyals, Oriol
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.
FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation
Riley, Parker, Dozat, Timothy, Botha, Jan A., Garcia, Xavier, Garrette, Dan, Riesa, Jason, Firat, Orhan, Constant, Noah
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distractor terms. We explore automatic evaluation metrics for FRMT and validate their correlation with expert human evaluation across both region-matched and mismatched rating scenarios. Finally, we present a number of baseline models for this task, and offer guidelines for how researchers can train, evaluate, and compare their own models. Our dataset and evaluation code are publicly available: https://bit.ly/frmt-task
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Zhang, Yu, Han, Wei, Qin, James, Wang, Yongqiang, Bapna, Ankur, Chen, Zhehuai, Chen, Nanxin, Li, Bo, Axelrod, Vera, Wang, Gary, Meng, Zhong, Hu, Ke, Rosenberg, Andrew, Prabhavalkar, Rohit, Park, Daniel S., Haghani, Parisa, Riesa, Jason, Perng, Ginger, Soltau, Hagen, Strohman, Trevor, Ramabhadran, Bhuvana, Sainath, Tara, Moreno, Pedro, Chiu, Chung-Cheng, Schalkwyk, Johan, Beaufays, Françoise, Wu, Yonghui
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model [1], our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.
Multimodal Modeling For Spoken Language Identification
Bharadwaj, Shikhar, Ma, Min, Vashishth, Shikhar, Bapna, Ankur, Ganapathy, Sriram, Axelrod, Vera, Dalmia, Siddharth, Han, Wei, Zhang, Yu, van Esch, Daan, Ritchie, Sandy, Talukdar, Partha, Riesa, Jason
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification. Our study reveals that metadata such as video title, description and geographic location provide substantial information to identify the spoken language of the multimedia recording. We conduct experiments using two diverse public datasets of YouTube videos, and obtain state-of-the-art results on the language identification task. We additionally conduct an ablation study that describes the distinct contribution of each modality for language recognition.
SQuId: Measuring Speech Naturalness in Many Languages
Sellam, Thibault, Bapna, Ankur, Camp, Joshua, Mackinnon, Diana, Parikh, Ankur P., Riesa, Jason
Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 locales-the largest effort of this type to date. The main insight is that training one model on many locales consistently outperforms mono-locale baselines. We present our task, the model, and show that it outperforms a competitive baseline based on w2v-BERT and VoiceMOS by 50.0%. We then demonstrate the effectiveness of cross-locale transfer during fine-tuning and highlight its effect on zero-shot locales, i.e., locales for which there is no fine-tuning data. Through a series of analyses, we highlight the role of non-linguistic effects such as sound artifacts in cross-locale transfer. Finally, we present the effect of our design decision, e.g., model size, pre-training diversity, and language rebalancing with several ablation experiments.
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
Tsai, Henry, Ooi, Jayden, Ferng, Chun-Sung, Chung, Hyung Won, Riesa, Jason
Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient algorithm to search for fast models while maintaining model quality. We describe a novel approach to decompose the Transformer architecture into smaller components, and propose a sampling-based one-shot architecture search method to find an optimal model for inference. The model search process is more efficient than alternatives, adding only a small overhead to training time. By applying our methods to BERT-base architectures, we achieve 10% to 30% speedup for pre-trained BERT and 70% speedup on top of a previous state-of-the-art distilled BERT model on Cloud TPU-v2 with a generally acceptable drop in performance.