Goto

Collaborating Authors

 mlperf benchmark


Nvidia's Impressive H100 MLPerf Benchmark

#artificialintelligence

In the complex world of AI/ML processing, it can be hard to compare products from various vendors due to the wide range of models and workloads in use. MLPerf is a consortium of major industry players and research organizations that provides agreed-upon benchmark tests to try and standardize test results across various vendor offerings to give users a chance to evaluate competing performance claims. Nvidia has previously provided MLPerf test results for its A100 product. It has just released its MLPerf benchmarks for its new high end device, the H100. It sports an impressive 6.7X performance gain over the older A100 devices in certain workloads, and is still being optimized with software that could eventually push the performance even higher.


Nvidia's flagship AI chip reportedly 4.5x faster than the previous champ

#artificialintelligence

Nvidia announced yesterday that its upcoming H100 "Hopper" Tensor Core GPU set new performance records during its debut in the industry-standard MLPerf benchmarks, delivering results up to 4.5 times faster than the A100, which is currently Nvidia's fastest production AI chip. The MPerf benchmarks (technically called "MLPerfTM Inference 2.1") measure "inference" workloads, which demonstrate how well a chip can apply a previously trained machine learning model to new data. A group of industry firms known as the MLCommons developed the MLPerf benchmarks in 2018 to deliver a standardized metric for conveying machine learning performance to potential customers. In particular, the H100 did well in the BERT-Large benchmark, which measures natural language-processing performance using the BERT model developed by Google. Nvidia credits this particular result to the Hopper architecture's Transformer Engine, which specifically accelerates training transformer models.


NVIDIA Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests

#artificialintelligence

In its debut in the industry MLPerf benchmarks, NVIDIA Orin, a low-power system-on-chip based on the NVIDIA Ampere architecture, set new records in AI inference, raising the bar in per-accelerator performance at the edge. Overall, NVIDIA with its partners continued to show the highest performance and broadest ecosystem for running all machine-learning workloads and scenarios in this fifth round of the industry metric for production AI. In edge AI, a pre-production version of our NVIDIA Orin led in five of six performance tests. It ran up to 5x faster than our previous generation Jetson AGX Xavier, while delivering an average of 2x better energy efficiency. NVIDIA Orin is available today in the NVIDIA Jetson AGX Orin developer kit for robotics and autonomous systems.


MLPerf Benchmarks: The Secret Behind Successful AI

#artificialintelligence

Even though they have been around for years, the phrase "MLPerf benchmarks" holds little meaning to most people outside of the AI developer community. However, this community-driven benchmark suite, which measures performance of a broad range of machine learning (ML) tasks, is quickly becoming the gold standard for the fair and unbiased assessment of accelerated computing solutions for machine learning training, inference, and high performance computing (HPC). The era of MLPerf is here, and everyone should be paying attention. Organizations across every industry are racing to take advantage of AI and machine learning to improve their businesses. According to Karl Freund, founder and principal analyst at Cambrian AI Research, businesses should expect that customer demand for AI-accelerated outcomes will continue to grow.


NVIDIA Wins MLPerf Inference Benchmarks – NVIDIA Developer News Center

#artificialintelligence

Today, NVIDIA posted the fastest results on new MLPerf benchmarks measuring the performance of AI inference workloads in data centers and at the edge. The new results come on the heels of the company's equally strong results in the MLPerf benchmarks posted earlier this year. MLPerf's five inference benchmarks -- applied across a range of form factors and four inferencing scenarios -- cover such established AI applications as image classification, object detection and translation. NVIDIA topped all five benchmarks for both data center-focused scenarios (server and offline), with Turing GPUs providing the highest performance per processor among commercially available entries. Xavier provided the highest performance among commercially available edge and mobile SoCs under both edge-focused scenarios (single-stream and multistream). All of NVIDIA's MLPerf results were achieved using NVIDIA TensorRT 6 high-performance deep learning inference software that optimizes and deploys AI applications easily in production from the data center to the edge.


Scale MLPerf-0.6 models on Google TPU-v3 Pods

Kumar, Sameer, Bitorff, Victor, Chen, Dehao, Chou, Chiachen, Hechtman, Blake, Lee, HyoukJoong, Kumar, Naveen, Mattson, Peter, Wang, Shibo, Wang, Tao, Xu, Yuanzhong, Zhou, Zongwei

arXiv.org Artificial Intelligence

The recent submission of Google TPU-v3 Pods to the industry wide MLPerf v0.6 training benchmark demonstrates the scalability of a suite of industry relevant ML models. MLPerf defines a suite of models, datasets and rules to follow when benchmarking to ensure results are comparable across hardware, frameworks and companies. Using this suite of models, we discuss the optimizations and techniques including choice of optimizer, spatial partitioning and weight update sharding necessary to scale to 1024 TPU chips. Furthermore, we identify properties of models that make scaling them challenging, such as limited data parallelism and unscaled weights. These optimizations contribute to record performance in transformer, Resnet-50 and SSD in the Google MLPerf-0.6 submission.


Demystifying the MLPerf Benchmark Suite

Verma, Snehil, Wu, Qinzhe, Hanindhito, Bagus, Jha, Gunjan, John, Eugene B., Radhakrishnan, Ramesh, John, Lizy K.

arXiv.org Machine Learning

MLPerf, an emerging machine learning benchmark suite strives to cover a broad range of applications of machine learning. We present a study on its characteristics and how the MLPerf benchmarks differ from some of the previous deep learning benchmarks like DAWNBench and DeepBench. We find that application benchmarks such as MLPerf (although rich in kernels) exhibit different features compared to kernel benchmarks such as DeepBench. MLPerf benchmark suite contains a diverse set of models which allows unveiling various bottlenecks in the system. Based on our findings, dedicated low latency interconnect between GPUs in multi-GPU systems is required for optimal distributed deep learning training. We also observe variation in scaling efficiency across the MLPerf models. The variation exhibited by the different models highlight the importance of smart scheduling strategies for multi-GPU training. Another observation is that CPU utilization increases with increase in number of GPUs used for training. Corroborating prior work we also observe and quantify improvements possible by compiler optimizations, mixed-precision training and use of Tensor Cores.