Amazon begins shifting Alexa's cloud AI to its own silicon
On Thursday, an Amazon AWS blogpost announced that the company has moved most of the cloud processing for its Alexa personal assistant off of Nvidia GPUs and onto its own Inferentia Application Specific Integrated Circuit (ASIC). AWS Inferentia is a custom chip, built by AWS, to accelerate machine learning inference workloads and optimize their cost. Each NeuronCore implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps cut down on external memory accesses, dramatically reducing latency and increasing throughput. When an Amazon customer--usually someone who owns an Echo or Echo dot--makes use of the Alexa personal assistant, very little of the processing is done on the device itself.
Nov-15-2020, 13:05:16 GMT