Nvidia's Speedy New Inference Engine Keeps BERT Latency Within a Millisecond
Disappointment abounds when your data scientists dial in the accuracy on deep learning models to a high degree but are then eventually forced to gut the model for inference because of resource constraints. Fortunately, that will not happen often using the latest release of Nvidia's TensorRT inference engine, which can run the BERT-Large transformer model with less than a millisecond of latency, according to the AI systems maker. "Traditionally, training for AI is always done in the data center," Siddharth Sharma, Nvidia's head of product marketing for AI Software said in a July 19 (Monday) briefing. "You start with petabytes of data, hundreds of thousands of hours of speech data. You train the model to the highest accuracy that you can. And then once you trained it, you actually throw it over for inference."
Aug-3-2021, 17:11:53 GMT
- Country:
- North America > United States
- California > San Diego County > San Diego (0.05)
- Europe > Slovenia
- Drava > Municipality of Benedikt > Benedikt (0.05)
- North America > United States
- Industry:
- Information Technology > Hardware (0.90)
- Technology: