open foundation model
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
NVIDIA, null, Bjorck, Johan, Castañeda, Fernando, Cherniadev, Nikita, Da, Xingye, Ding, Runyu, Fan, Linxi "Jim", Fang, Yu, Fox, Dieter, Hu, Fengyuan, Huang, Spencer, Jang, Joel, Jiang, Zhenyu, Kautz, Jan, Kundalia, Kaushil, Lao, Lawrence, Li, Zhiqi, Lin, Zongyu, Lin, Kevin, Liu, Guilin, Llontop, Edith, Magne, Loic, Mandlekar, Ajay, Narayan, Avnish, Nasiriany, Soroush, Reed, Scott, Tan, You Liang, Wang, Guanzhi, Wang, Zu, Wang, Jing, Wang, Qi, Xiang, Jiannan, Xie, Yuqi, Xu, Yinzhen, Xu, Zhenjia, Ye, Seonghyeon, Yu, Zhiding, Zhang, Ao, Zhang, Hao, Zhao, Yizhou, Zheng, Ruijie, Zhu, Yuke
General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy in the human world. A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks. To this end, we introduce GR00T N1, an open foundation model for humanoid robots. GR00T N1 is a Vision-Language-Action (VLA) model with a dual-system architecture. The vision-language module (System 2) interprets the environment through vision and language instructions. The subsequent diffusion transformer module (System 1) generates fluid motor actions in real time. Both modules are tightly coupled and jointly trained end-to-end. We train GR00T N1 with a heterogeneous mixture of real-robot trajectories, human videos, and synthetically generated datasets. We show that our generalist robot model GR00T N1 outperforms the state-of-the-art imitation learning baselines on standard simulation benchmarks across multiple robot embodiments. Furthermore, we deploy our model on the Fourier GR-1 humanoid robot for language-conditioned bimanual manipulation tasks, achieving strong performance with high data efficiency.
Open foundation models for Azerbaijani language
Isbarov, Jafar, Huseynova, Kavsar, Mammadov, Elvin, Hajili, Mammad
The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.
PRISM: A Design Framework for Open-Source Foundation Model Safety
Neumann, Terrence, Jones, Bryan
The rapid advancement of open-source foundation models has brought transparency and accessibility to this groundbreaking technology. However, this openness has also enabled the development of highly-capable, unsafe models, as exemplified by recent instances such as WormGPT and FraudGPT, which are specifically designed to facilitate criminal activity. As the capabilities of open foundation models continue to grow, potentially outpacing those of closed-source models, the risk of misuse by bad actors poses an increasingly serious threat to society. This paper addresses the critical question of how open foundation model developers should approach model safety in light of these challenges. Our analysis reveals that open-source foundation model companies often provide less restrictive acceptable use policies (AUPs) compared to their closed-source counterparts, likely due to the inherent difficulties in enforcing such policies once the models are released. To tackle this issue, we introduce PRISM, a design framework for open-source foundation model safety that emphasizes Private, Robust, Independent Safety measures, at Minimal marginal cost of compute. The PRISM framework proposes the use of modular functions that moderate prompts and outputs independently of the core language model, offering a more adaptable and resilient approach to safety compared to the brittle reinforcement learning methods currently used for value alignment. By focusing on identifying AUP violations and engaging the developer community in establishing consensus around safety design decisions, PRISM aims to create a safer open-source ecosystem that maximizes the potential of these powerful technologies while minimizing the risks to individuals and society as a whole.
Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence
Basdevant, Adrien, François, Camille, Storchan, Victor, Bankston, Kevin, Bdeir, Ayah, Behlendorf, Brian, Debbah, Merouane, Kapoor, Sayash, LeCun, Yann, Surman, Mark, King-Turvey, Helen, Lambert, Nathan, Maffulli, Stefano, Marda, Nik, Shivkumar, Govind, Tunney, Justine
Over the past year, there has been a robust debate about the benefits and risks of open sourcing foundation models. However, this discussion has often taken place at a high level of generality or with a narrow focus on specific technical attributes. In part, this is because defining open source for foundation models has proven tricky, given its significant differences from traditional software development. In order to inform more practical and nuanced decisions about opening AI systems, including foundation models, this paper presents a framework for grappling with openness across the AI stack. It summarizes previous work on this topic, analyzes the various potential reasons to pursue openness, and outlines how openness varies in different parts of the AI stack, both at the model and at the system level. In doing so, its authors hope to provide a common descriptive framework to deepen a nuanced and rigorous understanding of openness in AI and enable further work around definitions of openness and safety in AI.
On the Societal Impact of Open Foundation Models
Kapoor, Sayash, Bommasani, Rishi, Klyman, Kevin, Longpre, Shayne, Ramaswami, Ashwin, Cihon, Peter, Hopkins, Aspen, Bankston, Kevin, Biderman, Stella, Bogen, Miranda, Chowdhury, Rumman, Engler, Alex, Henderson, Peter, Jernite, Yacine, Lazar, Seth, Maffulli, Stefano, Nelson, Alondra, Pineau, Joelle, Skowron, Aviya, Song, Dawn, Storchan, Victor, Zhang, Daniel, Ho, Daniel E., Liang, Percy, Narayanan, Arvind
Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on open foundation models, defined here as those with broadly available model weights (e.g. Llama 2, Stable Diffusion XL). We identify five distinctive properties (e.g. greater customizability, poor monitoring) of open foundation models that lead to both their benefits and risks. Open foundation models present significant benefits, with some caveats, that span innovation, competition, the distribution of decision-making power, and transparency. To understand their risks of misuse, we design a risk assessment framework for analyzing their marginal risk. Across several misuse vectors (e.g. cyberattacks, bioweapons), we find that current research is insufficient to effectively characterize the marginal risk of open foundation models relative to pre-existing technologies. The framework helps explain why the marginal risk is low in some cases, clarifies disagreements about misuse risks by revealing that past work has focused on different subsets of the framework with different assumptions, and articulates a way forward for more constructive debate. Overall, our work helps support a more grounded assessment of the societal impact of open foundation models by outlining what research is needed to empirically validate their theoretical benefits and risks.