Wang, Alexandr
Superintelligence Strategy: Expert Version
Hendrycks, Dan, Schmidt, Eric, Wang, Alexandr
Rapid advances in AI are beginning to reshape national security. Destabilizing AI developments could rupture the balance of power and raise the odds of great-power conflict, while widespread proliferation of capable AI hackers and virologists would lower barriers for rogue actors to cause catastrophe. Superintelligence -- AI vastly better than humans at nearly all cognitive tasks -- is now anticipated by AI researchers. Just as nations once developed nuclear strategies to secure their survival, we now need a coherent superintelligence strategy to navigate a new period of transformative change. We introduce the concept of Mutual Assured AI Malfunction (MAIM): a deterrence regime resembling nuclear mutual assured destruction (MAD) where any state's aggressive bid for unilateral AI dominance is met with preventive sabotage by rivals. Given the relative ease of sabotaging a destabilizing AI project -- through interventions ranging from covert cyberattacks to potential kinetic strikes on datacenters -- MAIM already describes the strategic picture AI superpowers find themselves in. Alongside this, states can increase their competitiveness by bolstering their economies and militaries through AI, and they can engage in nonproliferation to rogue actors to keep weaponizable AI capabilities out of their hands. Taken together, the three-part framework of deterrence, nonproliferation, and competitiveness outlines a robust strategy to superintelligence in the years ahead.
Humanity's Last Exam
Phan, Long, Gatti, Alice, Han, Ziwen, Li, Nathaniel, Hu, Josephina, Zhang, Hugh, Zhang, Chen Bo Calvin, Shaaban, Mohamed, Ling, John, Shi, Sean, Choi, Michael, Agrawal, Anish, Chopra, Arnav, Khoja, Adam, Kim, Ryan, Ren, Richard, Hausenloy, Jason, Zhang, Oliver, Mazeika, Mantas, Nguyen, Tung, Anderson, Daron, Shah, Imad Ali, Doroshenko, Mikhail, Stokes, Alun Cennyth, Mahmood, Mobeen, Lee, Jaeho, Pokutnyi, Oleksandr, Iskra, Oleg, Wang, Jessica P., Gerbicz, Robert, Levin, John-Clark, Popov, Serguei, Feng, Fiona, Feng, Steven Y., Zhao, Haoran, Yu, Michael, Gangal, Varun, Zou, Chelsea, Wang, Zihan, Kazakov, Mstyslav, Galgon, Geoff, Schmitt, Johannes, Sanchez, Alvaro, Lee, Yongki, Yeadon, Will, Sauers, Scott, Roth, Marc, Agu, Chidozie, Riis, Sรธren, Giska, Fabian, Utpala, Saiteja, Cheatom, Antrell, Giboney, Zachary, Goshu, Gashaw M., Crowson, Sarah-Jane, Naiya, Mohinder Maheshbhai, Burns, Noah, Finke, Lennart, Cheng, Zerui, Park, Hyunwoo, Fournier-Facio, Francesco, Zampese, Jennifer, Wydallis, John, Wydallis, John B., Hoerr, Ryan G., Nandor, Mark, Gehrunger, Tim, Cai, Jiaqi, McCarty, Ben, Nam, Jungbae, Taylor, Edwin, Jin, Jun, Loume, Gautier Abou, Cao, Hangrui, Garretson, Alexis C, Sileo, Damien, Ren, Qiuyu, Cojoc, Doru, Arkhipov, Pavel, Qazi, Usman, Bacho, Aras, Li, Lianghui, Motwani, Sumeet, de Witt, Christian Schroeder, Kopylov, Alexei, Veith, Johannes, Singer, Eric, Rissone, Paolo, Jin, Jaehyeok, Shi, Jack Wei Lun, Willcocks, Chris G., Prabhu, Ameya, Tang, Longke, Zhou, Kevin, Santos, Emily de Oliveira, Maksimov, Andrey Pupasov, Vendrow, Edward, Zenitani, Kengo, Robinson, Joshua, Mikov, Aleksandar, Guillod, Julien, Li, Yuqi, Pageler, Ben, Vendrow, Joshua, Kuchkin, Vladyslav, Marion, Pierre, Efremov, Denis, Lynch, Jayson, Liang, Kaiqu, Gritsevskiy, Andrew, Martinez, Dakotah, Crispino, Nick, Zvonkine, Dimitri, Fraga, Natanael Wildner, Soori, Saeed, Press, Ori, Tang, Henry, Salazar, Julian, Green, Sean R., Brรผssel, Lina, Twayana, Moon, Dieuleveut, Aymeric, Rogers, T. Ryan, Zhang, Wenjin, Finocchio, Ross, Li, Bikun, Yang, Jinzhou, Rao, Arun, Loiseau, Gabriel, Kalinin, Mikhail, Lukas, Marco, Manolescu, Ciprian, Stambaugh, Nate, Mishra, Subrata, Kamdoum, Ariel Ghislain Kemogne, Hogg, Tad, Jin, Alvin, Bosio, Carlo, Sun, Gongbo, Coppola, Brian P, Heidinger, Haline, Sayous, Rafael, Ivanov, Stefan, Cavanagh, Joseph M, Shen, Jiawei, Imperial, Joseph Marvin, Schwaller, Philippe, Senthilkuma, Shaipranesh, Bran, Andres M, Algaba, Andres, Verbeken, Brecht, Houte, Kelsey Van den, Van Der Sypt, Lynn, Noever, David, Schut, Lisa, Sucholutsky, Ilia, Zheltonozhskii, Evgenii, Yuan, Qiaochu, Lim, Derek, Stanley, Richard, Sivarajan, Shankar, Yang, Tong, Maar, John, Wykowski, Julian, Oller, Martรญ, Sandlin, Jennifer, Sahu, Anmol, Ardito, Cesare Giulio, Hu, Yuzheng, Dias, Felipe Meneguitti, Kreiman, Tobias, Rawal, Kaivalya, Vilchis, Tobias Garcia, Zu, Yuexuan, Lackner, Martin, Koppel, James, Nguyen, Jeremy, Antonenko, Daniil S., Chern, Steffi, Zhao, Bingchen, Arsene, Pierrot, Ivanov, Sergey, Poลwiata, Rafaล, Wang, Chenguang, Li, Daofeng, Crisostomi, Donato, Dehghan, Ali, Achilleos, Andrea, Ambay, John Arnold, Myklebust, Benjamin, Sen, Archan, Perrella, David, Kaparov, Nurdin, Inlow, Mark H, Zang, Allen, Ramakrishnan, Kalyan, Orel, Daniil, Poritski, Vladislav, Ben-David, Shalev, Berger, Zachary, Whitfill, Parker, Foster, Michael, Munro, Daniel, Ho, Linh, Hava, Dan Bar, Kuchkin, Aleksey, Lauff, Robert, Holmes, David, Sommerhage, Frank, Zhang, Anji, Moat, Richard, Schneider, Keith, Pyda, Daniel, Kazibwe, Zakayo, Singh, Mukhwinder, Clarke, Don, Kim, Dae Hyun, Fish, Sara, Elser, Veit, Vilchis, Victor Efren Guadarrama, Klose, Immo, Demian, Christoph, Anantheswaran, Ujjwala, Zweiger, Adam, Albani, Guglielmo, Li, Jeffery, Daans, Nicolas, Radionov, Maksim, Rozhoล, Vรกclav, Ginis, Vincent, Ma, Ziqiao, Stump, Christian, Platnick, Jacob, Nevirkovets, Volodymyr, Basler, Luke, Piccardo, Marco, Cohen, Niv, Singh, Virendra, Tkadlec, Josef, Rosu, Paul, Goldfarb, Alan, Padlewski, Piotr, Barzowski, Stanislaw, Montgomery, Kyle, Menezes, Aline, Patel, Arkil, Wang, Zixuan, Tucker-Foltz, Jamie, Stade, Jack, Grabb, Declan, Goertzen, Tom, Kazemi, Fereshteh, Milbauer, Jeremiah, Shukla, Abhishek, Elgnainy, Hossam, Labrador, Yan Carlos Leyva, He, Hao, Zhang, Ling, Givrรฉ, Alan, Wolff, Hew, Demir, Gรถzdenur, Aziz, Muhammad Fayez, Kaddar, Younesse, รngquist, Ivar, Chen, Yanxu, Thornley, Elliott, Zhang, Robin, Pan, Jiayi, Terpin, Antonio, Muennighoff, Niklas, Schoelkopf, Hailey, Zheng, Eric, Carmi, Avishy, Shah, Jainam, Brown, Ethan D. L., Zhu, Kelin, Bartolo, Max, Wheeler, Richard, Ho, Andrew, Barkan, Shaul, Wang, Jiaqi, Stehberger, Martin, Kretov, Egor, Bradshaw, Peter, Heimonen, JP, Sridhar, Kaustubh, Hossain, Zaki, Akov, Ido, Makarychev, Yury, Tam, Joanna, Hoang, Hieu, Cunningham, David M., Goryachev, Vladimir, Patramanis, Demosthenes, Krause, Michael, Redenti, Andrew, Aldous, David, Lai, Jesyin, Coleman, Shannon, Xu, Jiangnan, Lee, Sangwon, Magoulas, Ilias, Zhao, Sandy, Tang, Ning, Cohen, Michael K., Carroll, Micah, Paradise, Orr, Kirchner, Jan Hendrik, Steinerberger, Stefan, Ovchynnikov, Maksym, Matos, Jason O., Shenoy, Adithya, Wang, Michael, Nie, Yuzhou, Giordano, Paolo, Petersen, Philipp, Sztyber-Betley, Anna, Faraboschi, Paolo, Riblet, Robin, Crozier, Jonathan, Halasyamani, Shiv, Pinto, Antonella, Verma, Shreyas, Joshi, Prashant, Meril, Eli, Yong, Zheng-Xin, Tee, Allison, Andrรฉoletti, Jรฉrรฉmy, Weller, Orion, Singhal, Raghav, Zhang, Gang, Ivanov, Alexander, Khoury, Seri, Gustafsson, Nils, Mostaghimi, Hamid, Thaman, Kunvar, Chen, Qijia, Khรกnh, Tran Quoc, Loader, Jacob, Cavalleri, Stefano, Szlyk, Hannah, Brown, Zachary, Narayan, Himanshu, Roberts, Jonathan, Alley, William, Sun, Kunyang, Stendall, Ryan, Lamparth, Max, Reuel, Anka, Wang, Ting, Xu, Hanmeng, Hernรกndez-Cรกmara, Pablo, Martin, Freddie, Preu, Thomas, Korbak, Tomek, Abramovitch, Marcus, Williamson, Dominic, Bosio, Ida, Chen, Ziye, Bรกlint, Birรณ, Lo, Eve J. Y., Nunes, Maria Inรชs S., Jiang, Yibo, Bari, M Saiful, Kassani, Peyman, Wang, Zihao, Ansarinejad, Behzad, Sun, Yewen, Durand, Stephane, Douville, Guillaume, Tordera, Daniel, Balabanian, George, Anderson, Earth, Kvistad, Lynna, Moyano, Alejandro Josรฉ, Milliron, Hsiaoyun, Sakor, Ahmad, Eron, Murat, McAlister, Isaac C., O., Andrew Favre D., Shah, Shailesh, Zhou, Xiaoxiang, Kamalov, Firuz, Clark, Ronald, Abdoli, Sherwin, Santens, Tim, Wang, Harrison K, Chen, Evan, Tomasiello, Alessandro, De Luca, G. Bruno, Looi, Shi-Zhuo, Le, Vinh-Kha, Kolt, Noam, Mรผndler, Niels, Semler, Avi, Rodman, Emma, Drori, Jacob, Fossum, Carl J, Gloor, Luk, Jagota, Milind, Pradeep, Ronak, Fan, Honglu, Shah, Tej, Eicher, Jonathan, Chen, Michael, Thaman, Kushal, Merrill, William, Firsching, Moritz, Harris, Carter, Ciobรขcฤ, Stefan, Gross, Jason, Pandey, Rohan, Gusev, Ilya, Jones, Adam, Agnihotri, Shashank, Zhelnov, Pavel, Usawasutsakorn, Siranut, Mofayezi, Mohammadreza, Piperski, Alexander, Carauleanu, Marc, Zhang, David K., Dobarskyi, Kostiantyn, Ler, Dylan, Leventov, Roman, Soroko, Ignat, Jansen, Thorben, Creighton, Scott, Lauer, Pascal, Duersch, Joshua, Taamazyan, Vage, Bezzi, Dario, Morak, Wiktor, Ma, Wenjie, Held, William, Huy, Tran ฤuc, Xian, Ruicheng, Zebaze, Armel Randy, Mohamed, Mohanad, Leser, Julian Noah, Yuan, Michelle X, Yacar, Laila, Lengler, Johannes, Olszewska, Katarzyna, Shahrtash, Hossein, Oliveira, Edson, Jackson, Joseph W., Gonzalez, Daniel Espinosa, Zou, Andy, Chidambaram, Muthu, Manik, Timothy, Haffenden, Hector, Stander, Dashiell, Dasouqi, Ali, Shen, Alexander, Duc, Emilien, Golshani, Bita, Stap, David, Uzhou, Mikalai, Zhidkovskaya, Alina Borisovna, Lewark, Lukas, Rodriguez, Miguel Orbegozo, Vincze, Mรกtyรกs, Wehr, Dustin, Tang, Colin, Phillips, Shaun, Samuele, Fortuna, Muzhen, Jiang, Ekstrรถm, Fredrik, Hammon, Angela, Patel, Oam, Farhidi, Faraz, Medley, George, Mohammadzadeh, Forough, Peรฑaflor, Madellene, Kassahun, Haile, Friedrich, Alena, Sparrow, Claire, Perez, Rayner Hernandez, Sakal, Taom, Dhamane, Omkar, Mirabadi, Ali Khajegili, Hallman, Eric, Okutsu, Kenchi, Battaglia, Mike, Maghsoudimehrabani, Mohammad, Amit, Alon, Hulbert, Dave, Pereira, Roberto, Weber, Simon, Handoko, null, Peristyy, Anton, Malina, Stephen, Albanie, Samuel, Cai, Will, Mehkary, Mustafa, Aly, Rami, Reidegeld, Frank, Dick, Anna-Katharina, Friday, Cary, Sidhu, Jasdeep, Shapourian, Hassan, Kim, Wanyoung, Costa, Mariana, Gurdogan, Hubeyb, Weber, Brian, Kumar, Harsh, Jiang, Tong, Agarwal, Arunim, Ceconello, Chiara, Vaz, Warren S., Zhuang, Chao, Park, Haon, Tawfeek, Andrew R., Aggarwal, Daattavya, Kirchhof, Michael, Dai, Linjie, Kim, Evan, Ferret, Johan, Wang, Yuzhou, Yan, Minghao, Burdzy, Krzysztof, Zhang, Lixin, Franca, Antonio, Pham, Diana T., Loh, Kang Yong, Robinson, Joshua, Jackson, Abram, Gul, Shreen, Chhablani, Gunjan, Du, Zhehang, Cosma, Adrian, Colino, Jesus, White, Colin, Votava, Jacob, Vinnikov, Vladimir, Delaney, Ethan, Spelda, Petr, Stritecky, Vit, Shahid, Syed M., Mourrat, Jean-Christophe, Vetoshkin, Lavr, Sponselee, Koen, Bacho, Renas, de la Rosa, Florencia, Li, Xiuyu, Malod, Guillaume, Lang, Leon, Laurendeau, Julien, Kazakov, Dmitry, Adesanya, Fatimah, Portier, Julien, Hollom, Lawrence, Souza, Victor, Zhou, Yuchen Anna, Degorre, Julien, Yalฤฑn, Yiฤit, Obikoya, Gbenga Daniel, Arnaboldi, Luca, Rai, null, Bigi, Filippo, Boscรก, M. C., Shumar, Oleg, Bacho, Kaniuar, Clavier, Pierre, Recchia, Gabriel, Popescu, Mara, Shulga, Nikita, Tanwie, Ngefor Mildred, Peskoff, Denis, Lux, Thomas C. H., Rank, Ben, Ni, Colin, Brooks, Matthew, Yakimchyk, Alesia, Huanxu, null, Liu, null, Hรคggstrรถm, Olle, Verkama, Emil, Gundlach, Hans, Brito-Santana, Leonor, Amaro, Brian, Vajipey, Vivek, Grover, Rynaa, Fan, Yiyang, Silva, Gabriel Poesia Reis e, Xin, Linwei, Kratish, Yosi, ลucki, Jakub, Li, Wen-Ding, Gopi, Sivakanth, Caciolai, Andrea, Xu, Justin, Scaria, Kevin Joseph, Vargus, Freddie, Habibi, Farzad, Long, null, Lian, null, Rodolร , Emanuele, Robins, Jules, Cheng, Vincent, Fruhauff, Tony, Raynor, Brad, Qi, Hao, Jiang, Xi, Segev, Ben, Fan, Jingxuan, Martinson, Sarah, Wang, Erik Y., Hausknecht, Kaylie, Brenner, Michael P., Mao, Mao, Zhang, Xinyu, Avagian, David, Scipio, Eshawn Jessica, Ragoler, Alon, Tan, Justin, Sims, Blake, Plecnik, Rebeka, Kirtland, Aaron, Bodur, Omer Faruk, Shinde, D. P., Adoul, Zahra, Zekry, Mohamed, Karakoc, Ali, Santos, Tania C. B., Shamseldeen, Samir, Karim, Loukmane, Liakhovitskaia, Anna, Resman, Nate, Farina, Nicholas, Gonzalez, Juan Carlos, Maayan, Gabe, Hoback, Sarah, Pena, Rodrigo De Oliveira, Sherman, Glen, Kelley, Elizabeth, Mariji, Hodjat, Pouriamanesh, Rasoul, Wu, Wentao, Mendoza, Sandra, Alarab, Ismail, Cole, Joshua, Ferreira, Danyelle, Johnson, Bryan, Safdari, Mohammad, Dai, Liangti, Arthornthurasuk, Siriphan, Pronin, Alexey, Fan, Jing, Ramirez-Trinidad, Angel, Cartwright, Ashley, Pottmaier, Daphiny, Taheri, Omid, Outevsky, David, Stepanic, Stanley, Perry, Samuel, Askew, Luke, Rodrรญguez, Raรบl Adriรกn Huerta, Minissi, Ali M. R., Ali, Sam, Lorena, Ricardo, Iyer, Krishnamurthy, Fasiludeen, Arshad Anil, Salauddin, Sk Md, Islam, Murat, Gonzalez, Juan, Ducey, Josh, Somrak, Maja, Mavroudis, Vasilios, Vergo, Eric, Qin, Juehang, Borbรกs, Benjรกmin, Chu, Eric, Lindsey, Jack, Radhakrishnan, Anil, Jallon, Antoine, McInnis, I. M. J., Kumar, Pawan, Goswami, Laxman Prasad, Bugas, Daniel, Heydari, Nasser, Jeanplong, Ferenc, Apronti, Archimedes, Galal, Abdallah, Ze-An, Ng, Singh, Ankit, Xavier, Joan of Arc, Agarwal, Kanu Priya, Berkani, Mohammed, Junior, Benedito Alves de Oliveira, Malishev, Dmitry, Remy, Nicolas, Hartman, Taylor D., Tarver, Tim, Mensah, Stephen, Gimenez, Javier, Montecillo, Roselynn Grace, Campbell, Russell, Sharma, Asankhaya, Meer, Khalida, Alapont, Xavier, Patil, Deepakkumar, Maheshwari, Rajat, Dendane, Abdelkader, Shukla, Priti, Bogdanov, Sergei, Mรถller, Sรถren, Siddiqi, Muhammad Rehan, Saxena, Prajvi, Gupta, Himanshu, Enyekwe, Innocent, P, Ragavendran V, EL-Wasif, Zienab, Maksapetyan, Aleksandr, Rossbach, Vivien, Harjadi, Chris, Bahaloohoreh, Mohsen, Bian, Song, Lai, John, Uro, Justine Leon, Bateman, Greg, Sayed, Mohamed, Menshawy, Ahmed, Duclosel, Darling, Jain, Yashaswini, Aaron, Ashley, Tiryakioglu, Murat, Siddh, Sheeshram, Krenek, Keith, Hoover, Alex, McGowan, Joseph, Patwardhan, Tejal, Yue, Summer, Wang, Alexandr, Hendrycks, Dan
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Li, Nathaniel, Pan, Alexander, Gopal, Anjali, Yue, Summer, Berrios, Daniel, Gatti, Alice, Li, Justin D., Dombrowski, Ann-Kathrin, Goel, Shashwat, Phan, Long, Mukobi, Gabriel, Helm-Burger, Nathan, Lababidi, Rassin, Justen, Lennart, Liu, Andrew B., Chen, Michael, Barrass, Isabelle, Zhang, Oliver, Zhu, Xiaoyuan, Tamirisa, Rishub, Bharathi, Bhrugu, Khoja, Adam, Zhao, Zhenqi, Herbert-Voss, Ariel, Breuer, Cort B., Marks, Samuel, Patel, Oam, Zou, Andy, Mazeika, Mantas, Wang, Zifan, Oswal, Palash, Lin, Weiran, Hunt, Adam A., Tienken-Harder, Justin, Shih, Kevin Y., Talley, Kemper, Guan, John, Kaplan, Russell, Steneker, Ian, Campbell, David, Jokubaitis, Brad, Levinson, Alex, Wang, Jean, Qian, William, Karmakar, Kallol Krishna, Basart, Steven, Fitz, Stephen, Levine, Mindy, Kumaraguru, Ponnurangam, Tupakula, Uday, Varadharajan, Vijay, Wang, Ruoyu, Shoshitaishvili, Yan, Ba, Jimmy, Esvelt, Kevin M., Wang, Alexandr, Hendrycks, Dan
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai