Today, NVIDIA posted the fastest results on new MLPerf benchmarks measuring the performance of AI inference workloads in data centers and at the edge. The new results come on the heels of the company's equally strong results in the MLPerf benchmarks posted earlier this year. MLPerf's five inference benchmarks -- applied across a range of form factors and four inferencing scenarios -- cover such established AI applications as image classification, object detection and translation. NVIDIA topped all five benchmarks for both data center-focused scenarios (server and offline), with Turing GPUs providing the highest performance per processor among commercially available entries. Xavier provided the highest performance among commercially available edge and mobile SoCs under both edge-focused scenarios (single-stream and multistream).
Nvidia is touting another win on the latest set of MLPerf benchmarks released Wednesday. The GPU maker said it posted the fastest results on new MLPerf inference benchmarks, which measured the performance of AI inference workloads in data centers and at the edge. MLPerf's five inference benchmarks, applied across four inferencing scenarios, covered AI applications such as image classification, object detection and translation. Nvidia topped all five benchmarks for both data center-focused scenarios (server and offline), with its Turing GPUs. Meanwhile, the Xavier SoC turned in the highest performance among commercially available edge and mobile SoCs that submitted for MLPerf under edge-focused scenarios of single-stream and multi-stream.
The first MLPerf benchmark results are in, offering a new, objective measurement of the tools used to run AI workloads. The results show that Nvidia, up against solutions from Google and Intel, achieved the best performance in the six categories for which it submitted. MLPerf is a broad benchmark suite for measuring performance of machine learning (ML) software frameworks (such as TensorFlow, PyTorch and MXNet), ML hardware platforms (including Google TPUs, Intel CPUs and Nvidia GPUs) and ML cloud platforms. Several companies, as well as researchers from institutions like Harvard, Stanford and the University of California Berkeley, first agreed to support the benchmarks in May. The goal is to give developers and enterprise IT teams information to help them evaluate existing offerings and focus future development.
Nvidia has released a new version of TensorRT, a runtime system for serving inferences using deep learning models through Nvidia's own GPUs. Inferences, or predictions made from a trained model, can be served from either CPUs or GPUs. Serving inferences from GPUs is part of Nvidia's strategy to get greater adoption of its processors, countering what AMD is doing to break Nvidia's stranglehold on the machine learning GPU market. Nvidia claims the GPU-based TensorRT is better across the board for inferencing than CPU-only approaches. One of Nvidia's proffered benchmarks, the AlexNet image classification test under the Caffe framework, claims TensorRT to be 42 times faster than a CPU-only version of the same test -- 16,041 images per second vs. 374--when run on Nvidia's Tesla P40 processor.
After introducing the first inference benchmarks in June of 2019, today the MLPerf consortium released 595 inference benchmark results from 14 organizations. The MLPerf Inference v0.5 machine learning inference benchmark has been designed to measure how well and how quickly various accelerators and systems execute trained neural networks. The initial version of the benchmark, v0.5 currently only covers 5 networks/benchmark, and it doesn't yet have any power testing metrics, which would be necessary to measure overall energy efficiency. In any case, the benchmark has attracted the attention from the major hardware vendors, all of whom are keen to show off what their hardware can do on a standardized test, and to demonstrate to clients why their solution is superior. Of the 595 benchmark results released today, 166 are in the Closed Division intended for direct comparison of systems.