Training a 20–Billion Parameter AI Model on a Single Processor - EETimes

#artificialintelligence 

Cerebras has shown off the capabilities of its second–generation wafer–scale engine, announcing it has set the record for the largest AI model ever trained on a single device. For the first time, a natural language processing network with 20 billion parameters, GPT–NeoX 20B, was trained on a single device. A new type of neural network, the transformer, is taking over. Today, transformers are mainly used for natural language processing (NLP) where their attention mechanism can help spot the relationship between words in a sentence, but they are spreading to other AI applications, including vision. The bigger a transformer is, the more accurate it is.