AITopics

2502.06807

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceDec-21-2024

OpenAI o1 System Card

OpenAI, null, :, null, Jaech, Aaron, Kalai, Adam, Lerer, Adam, Richardson, Adam, El-Kishky, Ahmed, Low, Aiden, Helyar, Alec, Madry, Aleksander, Beutel, Alex, Carney, Alex, Iftimie, Alex, Karpenko, Alex, Passos, Alex Tachard, Neitz, Alexander, Prokofiev, Alexander, Wei, Alexander, Tam, Allison, Bennett, Ally, Kumar, Ananya, Saraiva, Andre, Vallone, Andrea, Duberstein, Andrew, Kondrich, Andrew, Mishchenko, Andrey, Applebaum, Andy, Jiang, Angela, Nair, Ashvin, Zoph, Barret, Ghorbani, Behrooz, Rossen, Ben, Sokolowsky, Benjamin, Barak, Boaz, McGrew, Bob, Minaiev, Borys, Hao, Botao, Baker, Bowen, Houghton, Brandon, McKinzie, Brandon, Eastman, Brydon, Lugaresi, Camillo, Bassin, Cary, Hudson, Cary, Li, Chak Ming, de Bourcy, Charles, Voss, Chelsea, Shen, Chen, Zhang, Chong, Koch, Chris, Orsinger, Chris, Hesse, Christopher, Fischer, Claudia, Chan, Clive, Roberts, Dan, Kappler, Daniel, Levy, Daniel, Selsam, Daniel, Dohan, David, Farhi, David, Mely, David, Robinson, David, Tsipras, Dimitris, Li, Doug, Oprica, Dragos, Freeman, Eben, Zhang, Eddie, Wong, Edmund, Proehl, Elizabeth, Cheung, Enoch, Mitchell, Eric, Wallace, Eric, Ritter, Erik, Mays, Evan, Wang, Fan, Such, Felipe Petroski, Raso, Filippo, Leoni, Florencia, Tsimpourlas, Foivos, Song, Francis, von Lohmann, Fred, Sulit, Freddie, Salmon, Geoff, Parascandolo, Giambattista, Chabot, Gildas, Zhao, Grace, Brockman, Greg, Leclerc, Guillaume, Salman, Hadi, Bao, Haiming, Sheng, Hao, Andrin, Hart, Bagherinezhad, Hessam, Ren, Hongyu, Lightman, Hunter, Chung, Hyung Won, Kivlichan, Ian, O'Connell, Ian, Osband, Ian, Gilaberte, Ignasi Clavera, Akkaya, Ilge, Kostrikov, Ilya, Sutskever, Ilya, Kofman, Irina, Pachocki, Jakub, Lennon, James, Wei, Jason, Harb, Jean, Twore, Jerry, Feng, Jiacheng, Yu, Jiahui, Weng, Jiayi, Tang, Jie, Yu, Jieqi, Candela, Joaquin Quiñonero, Palermo, Joe, Parish, Joel, Heidecke, Johannes, Hallman, John, Rizzo, John, Gordon, Jonathan, Uesato, Jonathan, Ward, Jonathan, Huizinga, Joost, Wang, Julie, Chen, Kai, Xiao, Kai, Singhal, Karan, Nguyen, Karina, Cobbe, Karl, Shi, Katy, Wood, Kayla, Rimbach, Kendra, Gu-Lemberg, Keren, Liu, Kevin, Lu, Kevin, Stone, Kevin, Yu, Kevin, Ahmad, Lama, Yang, Lauren, Liu, Leo, Maksin, Leon, Ho, Leyton, Fedus, Liam, Weng, Lilian, Li, Linden, McCallum, Lindsay, Held, Lindsey, Kuhn, Lorenz, Kondraciuk, Lukas, Kaiser, Lukasz, Metz, Luke, Boyd, Madelaine, Trebacz, Maja, Joglekar, Manas, Chen, Mark, Tintor, Marko, Meyer, Mason, Jones, Matt, Kaufer, Matt, Schwarzer, Max, Shah, Meghan, Yatbaz, Mehmet, Guan, Melody Y., Xu, Mengyuan, Yan, Mengyuan, Glaese, Mia, Chen, Mianna, Lampe, Michael, Malek, Michael, Wang, Michele, Fradin, Michelle, McClay, Mike, Pavlov, Mikhail, Wang, Miles, Wang, Mingxuan, Murati, Mira, Bavarian, Mo, Rohaninejad, Mostafa, McAleese, Nat, Chowdhury, Neil, Chowdhury, Neil, Ryder, Nick, Tezak, Nikolas, Brown, Noam, Nachum, Ofir, Boiko, Oleg, Murk, Oleg, Watkins, Olivia, Chao, Patrick, Ashbourne, Paul, Izmailov, Pavel, Zhokhov, Peter, Dias, Rachel, Arora, Rahul, Lin, Randall, Lopes, Rapha Gontijo, Gaon, Raz, Miyara, Reah, Leike, Reimar, Hwang, Renny, Garg, Rhythm, Brown, Robin, James, Roshan, Shu, Rui, Cheu, Ryan, Greene, Ryan, Jain, Saachi, Altman, Sam, Toizer, Sam, Toyer, Sam, Miserendino, Samuel, Agarwal, Sandhini, Hernandez, Santiago, Baker, Sasha, McKinney, Scott, Yan, Scottie, Zhao, Shengjia, Hu, Shengli, Santurkar, Shibani, Chaudhuri, Shraman Ray, Zhang, Shuyuan, Fu, Siyuan, Papay, Spencer, Lin, Steph, Balaji, Suchir, Sanjeev, Suvansh, Sidor, Szymon, Broda, Tal, Clark, Aidan, Wang, Tao, Gordon, Taylor, Sanders, Ted, Patwardhan, Tejal, Sottiaux, Thibault, Degry, Thomas, Dimson, Thomas, Zheng, Tianhao, Garipov, Timur, Stasi, Tom, Bansal, Trapit, Creech, Trevor, Peterson, Troy, Eloundou, Tyna, Qi, Valerie, Kosaraju, Vineet, Monaco, Vinnie, Pong, Vitchyr, Fomenko, Vlad, Zheng, Weiyi, Zhou, Wenda, McCabe, Wes, Zaremba, Wojciech, Dubois, Yann, Lu, Yinghai, Chen, Yining, Cha, Young, Bai, Yu, He, Yuchen, Zhang, Yuchen, Wang, Yunyun, Shao, Zheng, Li, Zhuohan

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

large language model, machine learning, natural language, (19 more...)

2412.1672

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.92)

arXiv.org Artificial IntelligenceApr-18-2024

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

McKinzie, Brandon, Gan, Zhe, Fauconnier, Jean-Philippe, Dodge, Sam, Zhang, Bowen, Dufter, Philipp, Shah, Dhruti, Du, Xianzhi, Peng, Futang, Weers, Floris, Belyi, Anton, Zhang, Haotian, Singh, Karanjeet, Kang, Doug, Jain, Ankur, Hè, Hongyu, Schwarzer, Max, Gunter, Tom, Kong, Xiang, Zhang, Aonan, Wang, Jianyu, Wang, Chong, Du, Nan, Lei, Tao, Wiseman, Sam, Yin, Guoli, Lee, Mark, Wang, Zirui, Pang, Ruoming, Grasch, Peter, Toshev, Alexander, Yang, Yinfei

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving stateof-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published multimodal pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models, including both dense variants up to 30B and mixture-of-experts (MoE) variants up to 64B, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

2403.09611

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceNov-21-2023

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Schwarzer, Max, Farebrother, Jesse, Greaves, Joshua, Cubuk, Ekin Dogus, Agarwal, Rishabh, Courville, Aaron, Bellemare, Marc G., Kalinin, Sergei, Mordatch, Igor, Castro, Pablo Samuel, Roccapriore, Kevin M.

Sub-atomically focused electron beams in scanning transmission electron microscopes (STEMs) can induce a broad spectrum of chemical changes, including defect formation, reconfiguration of chemical bonds, and dopant insertion. Several groups have shown the feasibility of direct atomic manipulation via electron beam stimulation, which holds great promise for a number of downstream applications such as material design, solid-state quantum computers, and others (Jesse et al, 2018; Susi et al, 2017b; Dyck et al, 2017; Tripathi et al, 2018; Dyck et al, 2018). One of the challenges for advances in this space is that these types of atomic manipulation rely on manual control by highly-trained experts, which is expensive and slow. The ability to accurately automate this type of beam control could thereby result in tremendous impact on the feasibility of atomic manipulation for real use cases. A critical requirement for this automation is accurate estimation of the transition dynamics of atoms when stimulated by focused electron beams.

artificial intelligence, electron beam, machine learning, (20 more...)

2311.17894

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.82)

Industry:

Energy (0.46)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-13-2023

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Schwarzer, Max, Obando-Ceron, Johan, Courville, Aaron, Bellemare, Marc, Agarwal, Rishabh, Castro, Pablo Samuel

We introduce a value-based RL agent, which we 64 call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling 16 the neural networks used for value estimation, as well as a number of other design choices that 4 enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design 1 choices and provide insights for future work. We 2015 2017 2019 2021 2023 end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. Figure 1: Environment samples to reach human-level performance, We make our code and data publicly available.

bbf, machine learning, reinforcement learning, (15 more...)

2305.19452

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceOct-26-2023

Large Language Models as Generalizable Policies for Embodied Tasks

Szot, Andrew, Schwarzer, Max, Agrawal, Harsh, Mazoure, Bogdan, Talbott, Walter, Metcalf, Katherine, Mackraz, Natalie, Hjelm, Devon, Toshev, Alexander

We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other common learned baselines or zero-shot applications of LLMs. Finally, to aid the community in studying language conditioned, massively multi-task, embodied AI problems we release a novel benchmark, Language Rearrangement, consisting of 150,000 training and 1,000 testing tasks for language-conditioned rearrangement. Video examples of LLaRP in unseen Language Rearrangement instructions are at https://llm-rl.github.io.

large language model, machine learning, natural language, (17 more...)

2310.17722

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningAug-30-2021

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Agarwal, Rishabh, Schwarzer, Max, Castro, Pablo Samuel, Courville, Aaron, Bellemare, Marc G.

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.

algorithm, computer game, deep learning, (18 more...)

2108.13264

Country: North America > Canada > Quebec > Montreal (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Machine LearningOct-3-2020

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Schwarzer, Max, Anand, Ankesh, Goel, Rishab, Hjelm, R Devon, Courville, Aaron, Bachman, Philip

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. The code associated with this work is available at https: //github.com/mila-iqia/spr.

artificial intelligence, augmentation, computer game, (15 more...)

2007.05929

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-19-2019

GEAR: Geometry-Aware R\'enyi Information

Gallego, Jose, Vani, Ankit, Schwarzer, Max, Lacoste-Julien, Simon

Shannon's seminal theory of information has been of paramount importance in the development of modern machine learning techniques. However, standard information measures deal with probability distributions over an alphabet considered as a mere set of symbols and disregard further geometric structure, which might be available in the form of a metric or similarity function. We advocate the use of a notion of entropy that reflects not only the relative abundances of symbols but also the similarities between them, which was originally introduced in theoretical ecology to study the diversity of biological communities. Echoing this idea, we propose a criterion for comparing two probability distributions (possibly degenerate and with non-overlapping supports) that takes into account the geometry of the space in which the distributions are defined. Our proposal exhibits performance on par with state-of-the-art methods based on entropy-regularized optimal transport, but enjoys a closed-form expression and thus a lower computational cost. We demonstrate the versatility of our proposal via experiments on a broad range of domains: computing image barycenters, approximating densities with a collection of (super-) samples; summarizing texts; assessing mode coverage; as well as training generative models.

artificial intelligence, entropy, neural network, (18 more...)

1906.08325

Country:

North America > United States > Virginia (0.14)
North America > United States > California (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningOct-14-2018

Learning to fail: Predicting fracture evolution in brittle materials using recurrent graph convolutional neural networks

Schwarzer, Max, Rogan, Bryce, Ruan, Yadong, Song, Zhengming, Lee, Diana, Percus, Allon G., Chau, Viet T., Moore, Bryan A., Rougier, Esteban, Viswanathan, Hari S., Srinivasan, Gowri

Understanding dynamic fracture propagation is essential to predicting how brittle materials fail. Various mathematical models and computational applications have been developed to predict fracture evolution and coalescence, including finite-discrete element methods such as the Hybrid Optimization Software Suite (HOSS). While such methods achieve high fidelity results, they can be computationally prohibitive: a single simulation takes hours to run, and thousands of simulations are required for a statistically meaningful ensemble. We propose a machine learning approach that, once trained on data from HOSS simulations, can predict fracture growth statistics within seconds. Our method uses deep learning, exploiting the capabilities of a graph convolutional network to recognize features of the fracturing material, along with a recurrent neural network to model the evolution of these features. In this way, we simultaneously generate predictions for qualitatively distinct material properties. Our prediction for total damage in a coalesced fracture, at the final simulation time step, is within 3% of its actual value, and our prediction for total length of a coalesced fracture is within 2%. We also develop a novel form of data augmentation that compensates for the modest size of our training data, and an ensemble learning approach that enables us to predict when the material fails, with a mean absolute error of approximately 15%.

deep learning, fracture, upstream oil & gas, (18 more...)

1810.06118

Country: North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Government (1.00)
Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)