Nvidia's Speedy New Inference Engine Keeps BERT Latency Within a Millisecond

Open in new window