Democratizing access to large-scale language models with OPT-175B
We achieved 147 TFLOP/s/GPU utilization on NVIDIA's 80 GB A100 GPUs, roughly 17 percent higher than published by NVIDIA researchers on similar hardware. By sharing these baselines along with the codebase to train a 175B model efficiently, we have an opportunity to reduce our collective environmental footprint while also allowing new results and progress in the field to be measurable in a consistent manner. For AI research to advance, the broader scientific community must be able to work together with cutting-edge models to effectively explore their potential while also probing for their vulnerabilities at the same time. As with our previous open-science initiatives, such as the Image Similarity Challenge, the Deepfake Detection Challenge, and the Hateful Memes Challenge, Meta AI believes that collaboration across research organizations is critical to the responsible development of AI technologies. While there are many exciting developments in the space of large language models, the limitations and risks these models pose are still not well understood. Without direct access to these models, researchers are also limited in their ability to design detection and mitigation strategies for possible harm, which leaves detection and mitigation in the hands of only those with sufficient capital to access models of this scale. We hope that OPT-175B will bring more voices to the frontier of large language model creation, help the community collectively design responsible release strategies, and add an unprecedented level of transparency and openness to the development of large language models in the field. Access the open source code and small-scale pretrained models here, request access to OPT-175B here, and read the paper here. Pretrained models are all licensed under the OPT-175B License Agreement.
Jun-6-2022, 01:00:14 GMT