trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Fan, Haozheng, Zhou, Hao, Huang, Guangtai, Raman, Parameswaran, Fu, Xinwei, Gupta, Gaurav, Ram, Dhananjay, Wang, Yida, Huan, Jun
Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML had led to a scarcity of the expensive conventional accelerators (such as GPUs), which begs the need for the alternative specialized-accelerators that are scalable and cost-efficient. AWS Trainium is the second-generation machine learning accelerator that has been purposely built for training large deep learning models. Its corresponding instance, Amazon EC2 trn1, is an alternative to GPU instances for LLM training. However, training LLMs with billions of parameters on trn1 is challenging due to its relatively nascent software ecosystem. In this paper, we showcase HLAT: a 7 billion parameter decoder-only LLM pre-trained using trn1 instances over 1.8 trillion tokens. The performance of HLAT is benchmarked against popular open source baseline models including LLaMA and OpenLLaMA, which have been trained on NVIDIA GPUs and Google TPUs, respectively. On various evaluation tasks, we show that HLAT achieves model quality on par with the baselines. We also share the best practice of using the Neuron Distributed Training Library (NDTL), a customized distributed training library for AWS Trainium to achieve efficient training. Our work demonstrates that AWS Trainium powered by the NDTL is able to successfully pre-train state-of-the-art LLM models with high performance and cost-effectiveness.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Middle East > Jordan (0.04)
Top AI Chip Announcements Of 2020
Last year, we compiled a list of top chips for accelerating ML tasks. We talked about the rising demand of AI-based systems on Chips and the year 2020 is no different -- the trend continued. While few chipmakers capitalised on this trend, chip giants like Intel had to undergo a tough period. They even had to sell their NAND division to South Korean chipmaker SK Hynix. Even Apple announced their separation from Intel processors and opened a new chapter of Apple Silicon.
- Semiconductors & Electronics (1.00)
- Information Technology > Hardware (0.53)
AWS' custom chip family expands, launches Trainium for machine learning models
AWS is launching its own machine learning chip to train models for what CEO Andy Jassy says will be the "most cost effective training in the cloud." The custom machine learning processor, called AWS Trainium, follows what is becoming a common blueprint for its silicon strategy. AWS is ultimately targeting enterprises that are just starting to train models and build out their AI strategies. Trainium will launch in 2021 and follow AWS instances on Intel's Habana Gaudi processors.
Amazon debuts Trainium, a custom chip for machine learning training in the cloud
Amazon today debuted AWS Trainium, a chip custom-designed to deliver what the company describes as cost-effective machine learning model training in the cloud. It comes ahead of the availability of new Habana Gaudi-based Amazon Elastic Compute Cloud (EC2) instances built specifically for machine learning training, powered by Intel's new Habana Gaudi processors. "We know that we want to keep pushing the price performance on machine learning training, so we're going to have to invest in our own chips," AWS CEO Andy Jassy said during a keynote address at Amazon's re:Invent conference this morning. "You have an unmatched array of instances in AWS, coupled with innovation in chips." Amazon claims that Trainium will offer the most teraflops of any machine learning instance in the cloud, where a teraflop translates to a chip being able to process 1 trillion calculations a second.
The Cambrian AI Explosion Ramps Up
There's been a lot of news lately on the AI chip front, so I wanted to share a short synopsis of what has been happening for anyone who may be distracted by the holidays. Let's start with the big news. Amazon AWS (AMZN) made two significant AI announcements on December 1st at the annual AWS re:Invent conference. First, Andy Jassy, AWS head, announced that the cloud leader would offer Intel's Gaudi training chip in the elastic cloud. The AWS deployment is the first traction we have seen for Gaudi, which Intel received in its $2B acquisition of Habana Labs last year. This is long-awaited good news for Intel.