Nvidia's Speedy New Inference Engine Keeps BERT Latency Within a Millisecond

Aug-3-2021, 17:11:53 GMT–#artificialintelligence

Disappointment abounds when your data scientists dial in the accuracy on deep learning models to a high degree but are then eventually forced to gut the model for inference because of resource constraints. Fortunately, that will not happen often using the latest release of Nvidia's TensorRT inference engine, which can run the BERT-Large transformer model with less than a millisecond of latency, according to the AI systems maker. "Traditionally, training for AI is always done in the data center," Siddharth Sharma, Nvidia's head of product marketing for AI Software said in a July 19 (Monday) briefing. "You start with petabytes of data, hundreds of thousands of hours of speech data. You train the model to the highest accuracy that you can. And then once you trained it, you actually throw it over for inference."

accuracy, nvidia, sharma, (15 more...)

#artificialintelligence

Aug-3-2021, 17:11:53 GMT

News Web Page

Add feedback

Country:
- North America > United States
  - California > San Diego County > San Diego (0.05)
- Europe > Slovenia
  - Drava > Municipality of Benedikt > Benedikt (0.05)

Industry:
- Information Technology > Hardware (0.90)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found