Elser, Veit
Humanity's Last Exam
Phan, Long, Gatti, Alice, Han, Ziwen, Li, Nathaniel, Hu, Josephina, Zhang, Hugh, Zhang, Chen Bo Calvin, Shaaban, Mohamed, Ling, John, Shi, Sean, Choi, Michael, Agrawal, Anish, Chopra, Arnav, Khoja, Adam, Kim, Ryan, Ren, Richard, Hausenloy, Jason, Zhang, Oliver, Mazeika, Mantas, Nguyen, Tung, Anderson, Daron, Shah, Imad Ali, Doroshenko, Mikhail, Stokes, Alun Cennyth, Mahmood, Mobeen, Lee, Jaeho, Pokutnyi, Oleksandr, Iskra, Oleg, Wang, Jessica P., Gerbicz, Robert, Levin, John-Clark, Popov, Serguei, Feng, Fiona, Feng, Steven Y., Zhao, Haoran, Yu, Michael, Gangal, Varun, Zou, Chelsea, Wang, Zihan, Kazakov, Mstyslav, Galgon, Geoff, Schmitt, Johannes, Sanchez, Alvaro, Lee, Yongki, Yeadon, Will, Sauers, Scott, Roth, Marc, Agu, Chidozie, Riis, Søren, Giska, Fabian, Utpala, Saiteja, Cheatom, Antrell, Giboney, Zachary, Goshu, Gashaw M., Crowson, Sarah-Jane, Naiya, Mohinder Maheshbhai, Burns, Noah, Finke, Lennart, Cheng, Zerui, Park, Hyunwoo, Fournier-Facio, Francesco, Zampese, Jennifer, Wydallis, John, Wydallis, John B., Hoerr, Ryan G., Nandor, Mark, Gehrunger, Tim, Cai, Jiaqi, McCarty, Ben, Nam, Jungbae, Taylor, Edwin, Jin, Jun, Loume, Gautier Abou, Cao, Hangrui, Garretson, Alexis C, Sileo, Damien, Ren, Qiuyu, Cojoc, Doru, Arkhipov, Pavel, Qazi, Usman, Bacho, Aras, Li, Lianghui, Motwani, Sumeet, de Witt, Christian Schroeder, Kopylov, Alexei, Veith, Johannes, Singer, Eric, Rissone, Paolo, Jin, Jaehyeok, Shi, Jack Wei Lun, Willcocks, Chris G., Prabhu, Ameya, Tang, Longke, Zhou, Kevin, Santos, Emily de Oliveira, Maksimov, Andrey Pupasov, Vendrow, Edward, Zenitani, Kengo, Robinson, Joshua, Mikov, Aleksandar, Guillod, Julien, Li, Yuqi, Pageler, Ben, Vendrow, Joshua, Kuchkin, Vladyslav, Marion, Pierre, Efremov, Denis, Lynch, Jayson, Liang, Kaiqu, Gritsevskiy, Andrew, Martinez, Dakotah, Crispino, Nick, Zvonkine, Dimitri, Fraga, Natanael Wildner, Soori, Saeed, Press, Ori, Tang, Henry, Salazar, Julian, Green, Sean R., Brüssel, Lina, Twayana, Moon, Dieuleveut, Aymeric, Rogers, T. Ryan, Zhang, Wenjin, Finocchio, Ross, Li, Bikun, Yang, Jinzhou, Rao, Arun, Loiseau, Gabriel, Kalinin, Mikhail, Lukas, Marco, Manolescu, Ciprian, Stambaugh, Nate, Mishra, Subrata, Kamdoum, Ariel Ghislain Kemogne, Hogg, Tad, Jin, Alvin, Bosio, Carlo, Sun, Gongbo, Coppola, Brian P, Heidinger, Haline, Sayous, Rafael, Ivanov, Stefan, Cavanagh, Joseph M, Shen, Jiawei, Imperial, Joseph Marvin, Schwaller, Philippe, Senthilkuma, Shaipranesh, Bran, Andres M, Algaba, Andres, Verbeken, Brecht, Houte, Kelsey Van den, Van Der Sypt, Lynn, Noever, David, Schut, Lisa, Sucholutsky, Ilia, Zheltonozhskii, Evgenii, Yuan, Qiaochu, Lim, Derek, Stanley, Richard, Sivarajan, Shankar, Yang, Tong, Maar, John, Wykowski, Julian, Oller, Martí, Sandlin, Jennifer, Sahu, Anmol, Ardito, Cesare Giulio, Hu, Yuzheng, Dias, Felipe Meneguitti, Kreiman, Tobias, Rawal, Kaivalya, Vilchis, Tobias Garcia, Zu, Yuexuan, Lackner, Martin, Koppel, James, Nguyen, Jeremy, Antonenko, Daniil S., Chern, Steffi, Zhao, Bingchen, Arsene, Pierrot, Ivanov, Sergey, Poświata, Rafał, Wang, Chenguang, Li, Daofeng, Crisostomi, Donato, Dehghan, Ali, Achilleos, Andrea, Ambay, John Arnold, Myklebust, Benjamin, Sen, Archan, Perrella, David, Kaparov, Nurdin, Inlow, Mark H, Zang, Allen, Ramakrishnan, Kalyan, Orel, Daniil, Poritski, Vladislav, Ben-David, Shalev, Berger, Zachary, Whitfill, Parker, Foster, Michael, Munro, Daniel, Ho, Linh, Hava, Dan Bar, Kuchkin, Aleksey, Lauff, Robert, Holmes, David, Sommerhage, Frank, Zhang, Anji, Moat, Richard, Schneider, Keith, Pyda, Daniel, Kazibwe, Zakayo, Singh, Mukhwinder, Clarke, Don, Kim, Dae Hyun, Fish, Sara, Elser, Veit, Vilchis, Victor Efren Guadarrama, Klose, Immo, Demian, Christoph, Anantheswaran, Ujjwala, Zweiger, Adam, Albani, Guglielmo, Li, Jeffery, Daans, Nicolas, Radionov, Maksim, Rozhoň, Václav, Ginis, Vincent, Ma, Ziqiao, Stump, Christian, Platnick, Jacob, Nevirkovets, Volodymyr, Basler, Luke, Piccardo, Marco, Cohen, Niv, Singh, Virendra, Tkadlec, Josef, Rosu, Paul, Goldfarb, Alan, Padlewski, Piotr, Barzowski, Stanislaw, Montgomery, Kyle, Menezes, Aline, Patel, Arkil, Wang, Zixuan, Tucker-Foltz, Jamie, Stade, Jack, Grabb, Declan, Goertzen, Tom, Kazemi, Fereshteh, Milbauer, Jeremiah, Shukla, Abhishek, Elgnainy, Hossam, Labrador, Yan Carlos Leyva, He, Hao, Zhang, Ling, Givré, Alan, Wolff, Hew, Demir, Gözdenur, Aziz, Muhammad Fayez, Kaddar, Younesse, Ängquist, Ivar, Chen, Yanxu, Thornley, Elliott, Zhang, Robin, Pan, Jiayi, Terpin, Antonio, Muennighoff, Niklas, Schoelkopf, Hailey, Zheng, Eric, Carmi, Avishy, Shah, Jainam, Brown, Ethan D. L., Zhu, Kelin, Bartolo, Max, Wheeler, Richard, Ho, Andrew, Barkan, Shaul, Wang, Jiaqi, Stehberger, Martin, Kretov, Egor, Bradshaw, Peter, Heimonen, JP, Sridhar, Kaustubh, Hossain, Zaki, Akov, Ido, Makarychev, Yury, Tam, Joanna, Hoang, Hieu, Cunningham, David M., Goryachev, Vladimir, Patramanis, Demosthenes, Krause, Michael, Redenti, Andrew, Aldous, David, Lai, Jesyin, Coleman, Shannon, Xu, Jiangnan, Lee, Sangwon, Magoulas, Ilias, Zhao, Sandy, Tang, Ning, Cohen, Michael K., Carroll, Micah, Paradise, Orr, Kirchner, Jan Hendrik, Steinerberger, Stefan, Ovchynnikov, Maksym, Matos, Jason O., Shenoy, Adithya, Wang, Michael, Nie, Yuzhou, Giordano, Paolo, Petersen, Philipp, Sztyber-Betley, Anna, Faraboschi, Paolo, Riblet, Robin, Crozier, Jonathan, Halasyamani, Shiv, Pinto, Antonella, Verma, Shreyas, Joshi, Prashant, Meril, Eli, Yong, Zheng-Xin, Tee, Allison, Andréoletti, Jérémy, Weller, Orion, Singhal, Raghav, Zhang, Gang, Ivanov, Alexander, Khoury, Seri, Gustafsson, Nils, Mostaghimi, Hamid, Thaman, Kunvar, Chen, Qijia, Khánh, Tran Quoc, Loader, Jacob, Cavalleri, Stefano, Szlyk, Hannah, Brown, Zachary, Narayan, Himanshu, Roberts, Jonathan, Alley, William, Sun, Kunyang, Stendall, Ryan, Lamparth, Max, Reuel, Anka, Wang, Ting, Xu, Hanmeng, Hernández-Cámara, Pablo, Martin, Freddie, Preu, Thomas, Korbak, Tomek, Abramovitch, Marcus, Williamson, Dominic, Bosio, Ida, Chen, Ziye, Bálint, Biró, Lo, Eve J. Y., Nunes, Maria Inês S., Jiang, Yibo, Bari, M Saiful, Kassani, Peyman, Wang, Zihao, Ansarinejad, Behzad, Sun, Yewen, Durand, Stephane, Douville, Guillaume, Tordera, Daniel, Balabanian, George, Anderson, Earth, Kvistad, Lynna, Moyano, Alejandro José, Milliron, Hsiaoyun, Sakor, Ahmad, Eron, Murat, McAlister, Isaac C., O., Andrew Favre D., Shah, Shailesh, Zhou, Xiaoxiang, Kamalov, Firuz, Clark, Ronald, Abdoli, Sherwin, Santens, Tim, Wang, Harrison K, Chen, Evan, Tomasiello, Alessandro, De Luca, G. Bruno, Looi, Shi-Zhuo, Le, Vinh-Kha, Kolt, Noam, Mündler, Niels, Semler, Avi, Rodman, Emma, Drori, Jacob, Fossum, Carl J, Gloor, Luk, Jagota, Milind, Pradeep, Ronak, Fan, Honglu, Shah, Tej, Eicher, Jonathan, Chen, Michael, Thaman, Kushal, Merrill, William, Firsching, Moritz, Harris, Carter, Ciobâcă, Stefan, Gross, Jason, Pandey, Rohan, Gusev, Ilya, Jones, Adam, Agnihotri, Shashank, Zhelnov, Pavel, Usawasutsakorn, Siranut, Mofayezi, Mohammadreza, Piperski, Alexander, Carauleanu, Marc, Zhang, David K., Dobarskyi, Kostiantyn, Ler, Dylan, Leventov, Roman, Soroko, Ignat, Jansen, Thorben, Creighton, Scott, Lauer, Pascal, Duersch, Joshua, Taamazyan, Vage, Bezzi, Dario, Morak, Wiktor, Ma, Wenjie, Held, William, Huy, Tran Đuc, Xian, Ruicheng, Zebaze, Armel Randy, Mohamed, Mohanad, Leser, Julian Noah, Yuan, Michelle X, Yacar, Laila, Lengler, Johannes, Olszewska, Katarzyna, Shahrtash, Hossein, Oliveira, Edson, Jackson, Joseph W., Gonzalez, Daniel Espinosa, Zou, Andy, Chidambaram, Muthu, Manik, Timothy, Haffenden, Hector, Stander, Dashiell, Dasouqi, Ali, Shen, Alexander, Duc, Emilien, Golshani, Bita, Stap, David, Uzhou, Mikalai, Zhidkovskaya, Alina Borisovna, Lewark, Lukas, Rodriguez, Miguel Orbegozo, Vincze, Mátyás, Wehr, Dustin, Tang, Colin, Phillips, Shaun, Samuele, Fortuna, Muzhen, Jiang, Ekström, Fredrik, Hammon, Angela, Patel, Oam, Farhidi, Faraz, Medley, George, Mohammadzadeh, Forough, Peñaflor, Madellene, Kassahun, Haile, Friedrich, Alena, Sparrow, Claire, Perez, Rayner Hernandez, Sakal, Taom, Dhamane, Omkar, Mirabadi, Ali Khajegili, Hallman, Eric, Okutsu, Kenchi, Battaglia, Mike, Maghsoudimehrabani, Mohammad, Amit, Alon, Hulbert, Dave, Pereira, Roberto, Weber, Simon, Handoko, null, Peristyy, Anton, Malina, Stephen, Albanie, Samuel, Cai, Will, Mehkary, Mustafa, Aly, Rami, Reidegeld, Frank, Dick, Anna-Katharina, Friday, Cary, Sidhu, Jasdeep, Shapourian, Hassan, Kim, Wanyoung, Costa, Mariana, Gurdogan, Hubeyb, Weber, Brian, Kumar, Harsh, Jiang, Tong, Agarwal, Arunim, Ceconello, Chiara, Vaz, Warren S., Zhuang, Chao, Park, Haon, Tawfeek, Andrew R., Aggarwal, Daattavya, Kirchhof, Michael, Dai, Linjie, Kim, Evan, Ferret, Johan, Wang, Yuzhou, Yan, Minghao, Burdzy, Krzysztof, Zhang, Lixin, Franca, Antonio, Pham, Diana T., Loh, Kang Yong, Robinson, Joshua, Jackson, Abram, Gul, Shreen, Chhablani, Gunjan, Du, Zhehang, Cosma, Adrian, Colino, Jesus, White, Colin, Votava, Jacob, Vinnikov, Vladimir, Delaney, Ethan, Spelda, Petr, Stritecky, Vit, Shahid, Syed M., Mourrat, Jean-Christophe, Vetoshkin, Lavr, Sponselee, Koen, Bacho, Renas, de la Rosa, Florencia, Li, Xiuyu, Malod, Guillaume, Lang, Leon, Laurendeau, Julien, Kazakov, Dmitry, Adesanya, Fatimah, Portier, Julien, Hollom, Lawrence, Souza, Victor, Zhou, Yuchen Anna, Degorre, Julien, Yalın, Yiğit, Obikoya, Gbenga Daniel, Arnaboldi, Luca, Rai, null, Bigi, Filippo, Boscá, M. C., Shumar, Oleg, Bacho, Kaniuar, Clavier, Pierre, Recchia, Gabriel, Popescu, Mara, Shulga, Nikita, Tanwie, Ngefor Mildred, Peskoff, Denis, Lux, Thomas C. H., Rank, Ben, Ni, Colin, Brooks, Matthew, Yakimchyk, Alesia, Huanxu, null, Liu, null, Häggström, Olle, Verkama, Emil, Gundlach, Hans, Brito-Santana, Leonor, Amaro, Brian, Vajipey, Vivek, Grover, Rynaa, Fan, Yiyang, Silva, Gabriel Poesia Reis e, Xin, Linwei, Kratish, Yosi, Łucki, Jakub, Li, Wen-Ding, Gopi, Sivakanth, Caciolai, Andrea, Xu, Justin, Scaria, Kevin Joseph, Vargus, Freddie, Habibi, Farzad, Long, null, Lian, null, Rodolà, Emanuele, Robins, Jules, Cheng, Vincent, Fruhauff, Tony, Raynor, Brad, Qi, Hao, Jiang, Xi, Segev, Ben, Fan, Jingxuan, Martinson, Sarah, Wang, Erik Y., Hausknecht, Kaylie, Brenner, Michael P., Mao, Mao, Zhang, Xinyu, Avagian, David, Scipio, Eshawn Jessica, Ragoler, Alon, Tan, Justin, Sims, Blake, Plecnik, Rebeka, Kirtland, Aaron, Bodur, Omer Faruk, Shinde, D. P., Adoul, Zahra, Zekry, Mohamed, Karakoc, Ali, Santos, Tania C. B., Shamseldeen, Samir, Karim, Loukmane, Liakhovitskaia, Anna, Resman, Nate, Farina, Nicholas, Gonzalez, Juan Carlos, Maayan, Gabe, Hoback, Sarah, Pena, Rodrigo De Oliveira, Sherman, Glen, Kelley, Elizabeth, Mariji, Hodjat, Pouriamanesh, Rasoul, Wu, Wentao, Mendoza, Sandra, Alarab, Ismail, Cole, Joshua, Ferreira, Danyelle, Johnson, Bryan, Safdari, Mohammad, Dai, Liangti, Arthornthurasuk, Siriphan, Pronin, Alexey, Fan, Jing, Ramirez-Trinidad, Angel, Cartwright, Ashley, Pottmaier, Daphiny, Taheri, Omid, Outevsky, David, Stepanic, Stanley, Perry, Samuel, Askew, Luke, Rodríguez, Raúl Adrián Huerta, Minissi, Ali M. R., Ali, Sam, Lorena, Ricardo, Iyer, Krishnamurthy, Fasiludeen, Arshad Anil, Salauddin, Sk Md, Islam, Murat, Gonzalez, Juan, Ducey, Josh, Somrak, Maja, Mavroudis, Vasilios, Vergo, Eric, Qin, Juehang, Borbás, Benjámin, Chu, Eric, Lindsey, Jack, Radhakrishnan, Anil, Jallon, Antoine, McInnis, I. M. J., Kumar, Pawan, Goswami, Laxman Prasad, Bugas, Daniel, Heydari, Nasser, Jeanplong, Ferenc, Apronti, Archimedes, Galal, Abdallah, Ze-An, Ng, Singh, Ankit, Xavier, Joan of Arc, Agarwal, Kanu Priya, Berkani, Mohammed, Junior, Benedito Alves de Oliveira, Malishev, Dmitry, Remy, Nicolas, Hartman, Taylor D., Tarver, Tim, Mensah, Stephen, Gimenez, Javier, Montecillo, Roselynn Grace, Campbell, Russell, Sharma, Asankhaya, Meer, Khalida, Alapont, Xavier, Patil, Deepakkumar, Maheshwari, Rajat, Dendane, Abdelkader, Shukla, Priti, Bogdanov, Sergei, Möller, Sören, Siddiqi, Muhammad Rehan, Saxena, Prajvi, Gupta, Himanshu, Enyekwe, Innocent, P, Ragavendran V, EL-Wasif, Zienab, Maksapetyan, Aleksandr, Rossbach, Vivien, Harjadi, Chris, Bahaloohoreh, Mohsen, Bian, Song, Lai, John, Uro, Justine Leon, Bateman, Greg, Sayed, Mohamed, Menshawy, Ahmed, Duclosel, Darling, Jain, Yashaswini, Aaron, Ashley, Tiryakioglu, Murat, Siddh, Sheeshram, Krenek, Keith, Hoover, Alex, McGowan, Joseph, Patwardhan, Tejal, Yue, Summer, Wang, Alexandr, Hendrycks, Dan
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
A transparent approach to data representation
Deyo, Sean, Elser, Veit
We take inspiration from the non-negative matrix factorization (NMF) problem. In NMF, one large m n In 2006 Netflix released a data set -- roughly 100 million matrix M with non-negative values is factored as a product ratings of 17770 titles, given by 480189 viewers -- of two smaller non-negative matrices R and C of size and posed a challenge: Use this training data to predict m l and l n, respectively (where l m,n). Imagining the ratings in a separate, hidden set of ratings involving the set of ratings as the M matrix, with each row the same movies and viewers. The first to do so with a corresponding to a viewer and each column corresponding root-mean-square prediction error (RMSE) at least 10% to a movie, one can think of each row of R as an lower than that of Netflix's own system would receive a attribute vector for the corresponding viewer.
A logical word embedding for learning grammar
Deyo, Sean, Elser, Veit
We introduce the logical grammar emdebbing (LGE), a model inspired by pregroup grammars and categorial grammars to enable unsupervised inference of lexical categories and syntactic rules from a corpus of text. LGE produces comprehensible output summarizing its inferences, has a completely transparent process for producing novel sentences, and can learn from as few as a hundred sentences.
Learning grammar with a divide-and-concur neural network
Deyo, Sean, Elser, Veit
We implement a divide-and-concur iterative projection approach to context-free grammar inference. Unlike most state-of-the-art models of natural language processing, our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable -- one can read off from a solution how to construct grammatically valid sentences. Another advantage of our approach is the ability to infer meaningful grammatical rules from just a few sentences, compared to the hundreds of gigabytes of training data many other models employ. We demonstrate several ways of applying our approach: classifying words and inferring a grammar from scratch, taking an existing grammar and refining its categories and rules, and taking an existing grammar and expanding its lexicon as it encounters new words in new data.
Monotone Learning with Rectifier Networks
Elser, Veit, Schmidt, Dan, Yedidia, Jonathan
We introduce a new neural network model, together with a tractable and monotone online learning algorithm. Our model describes feed-forward networks for classification, with one output node for each class. The only nonlinear operation is rectification using a ReLU function with a bias. However, there is a rectifier on every edge rather than at the nodes of the network. There are also weights, but these are positive, static, and associated with the nodes. Our "rectified wire" networks are able to represent arbitrary Boolean functions. Only the bias parameters, on the edges of the network, are learned. Another departure in our approach, from standard neural networks, is that the loss function is replaced by a constraint. This constraint is simply that the value of the output node associated with the correct class should be zero. Our model has the property that the exact norm-minimizing parameter update, required to correctly classify a training item, is the solution to a quadratic program that can be computed with a few passes through the network. We demonstrate a training algorithm using this update, called sequential deactivation (SDA), on MNIST and some synthetic datasets. Upon adopting a natural choice for the nodal weights, SDA has no hyperparameters other than those describing the network structure. Our experiments explore behavior with respect to network size and depth in a family of sparse expander networks.