infiniband
Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness
Tan, Zeyuan, Yuan, Xiulong, He, Congjie, Sit, Man-Kit, Li, Guo, Liu, Xiaoze, Ai, Baole, Zeng, Kai, Pietzuch, Peter, Mai, Luo
Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it challenging to exploit GPUs effectively: using GPUs to sample only a few graph nodes yields lower performance than CPU-based sampling; and aggregating many features exhibits high data movement costs between GPUs and CPUs. Therefore, current GNN serving systems use CPUs for graph sampling and feature aggregation, limiting throughput. We describe Quiver, a distributed GPU-based GNN serving system with low-latency and high-throughput. Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling; and (2) for feature aggregation, Quiver relies on the feature access probability to decide which features to partition and replicate across a distributed GPU NUMA topology. We show that Quiver achieves up to 35 times lower latency with an 8 times higher throughput compared to state-of-the-art GNN approaches (DGL and PyG).
Nvidia Doubles Down on AI Supercomputing
Nvidia has outpaced itself with so many new GPUs for large-scale computing over the last several years that the strategy now seems to be to leave some near-term capability aside to allow big releases at the expected time of year. In today's case that timing is around the annual Supercomputing Conference (SC20) and while there is not something entirely new to marvel at GPU-wise, there is definite doubling of capacity and capability. The GPU maker announced that its A100 GPUs are capable of literally double the memory and performance with the addition of 80GB HBM2e devices, already shipped to some of its biggest HPC "Superpod" and in their DGX systems with wider availability via their partner network beginning in January 2021. For those partners, the overhead is simple, the capability and capacity jump adds another option without any overhead for the 400W of the 40GB GPUs and for the early Superpod customers, it's a simple tray shift, according to Paresh Kharya, Senior Director of Product Management at Nvidia. Having something new to announce is one thing, but it is probably more likely that without some delays, the original A100 might have had the 80GB of memory already.
Lenovo, Nvidia partnership bridges HPC and enterprise AI with switches for optimized networking - SiliconANGLE
Artificial intelligence is fast becoming a part of the everyday enterprise workflow, but the computing infrastructure to support such a data-intense task must modernize. As businesses transform to better leverage data intelligence and become more agile through cloud-native processes, high-performance networking becomes priority. But investing in the InfiniBand standard for high-performance computing network switches has been a hard sell for information-technology departments with an existing Ethernet fabric in place. Enabling enterprise to catch the fast train to intelligent business operations are long-time partners Nvidia Corp. and Lenovo Group Ltd. "We love, from an HPC perspective, to use InfiniBand," said Scott Tease (pictured, right), general manager of HPC and AI at Lenovo. "But most enterprise clients are using Ethernet. We go to a partner that we've trusted for a very long time. And we selected the Nvidia Mellanox Ethernet switch family."
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems - insideHPC
In this video from the Stanford HPC Conference, DK Panda from Ohio State University presents: Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems. "This talk will focus on challenges in designing HPC, Deep Learning, and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss the challenges in designing runtime environments for MPI X (PGAS-OpenSHMEM/UPC/CAF/UPC, OpenMP and Cuda) programming models by taking into account support for multi-core systems (KNL and OpenPower), high networks, GPGPUs (including GPUDirect RDMA) and energy awareness. Features and sample performance numbers from MVAPICH2 libraries will be presented. For the Deep Learning domain, we will focus on popular Deep Learning framewords (Caffe, CNTK, and TensorFlow) to extract performance and scalability with MVAPICH2-GDR MPI library and RDMA-enabled Big Data stacks. Finally, we will outline the challenges in moving these middleware to the Cloud environments."
Mellanox Technologies (MLNX) Q1 2017 Results - Earnings Call Transcript
At this time, all participants have been placed in a listen-only mode. And the floor will be open for your questions following the presentation. As a reminder, this conference is being recorded. And now I would like to turn the conference over to Mellanox. Leading the call today will be Eyal Waldman, President and CEO of Mellanox Technologies; and Jacob Shulman, Chief Financial Officer. By now, you've seen our press release and associated financial information that we furnished to the SEC on Form 8-K this afternoon. If not, you may access them on our website at ir.mellanox.com. As a reminder, today's discussion includes predictions, expectations, estimates and other information, all of which we consider to be forward-looking statements. Throughout today's discussion, we present important factors relating to our business that may potentially affect these forward-looking statements. These forward-looking statements are also subject to risks and uncertainties that may cause actual results to differ materially from statements made today. As a result, we caution you against placing undue reliance on these forward-looking statements. And we encourage you to review our most recent SEC reports, including our 10-K and 10-Q, for a complete discussion of these factors and other risks that may affect our future results or the market price of our ordinary shares.
InfiniBand will reach 200-gigabit speed next year
InfiniBand is set to hit 200Gbps (bits per second) in products that were announced Thursday, potentially accelerating machine-learning platforms as well as HPC (high-performance computing) systems. The massive computing performance of new servers equipped with GPUs calls for high network speeds, and these systems are quickly being deployed to handle machine-learning tasks, Dell'Oro Group analyst Sameh Boujelbene said. So-called HDR InfiniBand, which will be generally available next year in three sets of products from Mellanox Technologies, will double the top speed of InfiniBand. It will also have twice the top speed of Ethernet. But the high-performance crowd that's likely to adopt this new interconnect is a small one, Boujelbene said. Look for the top 10 percent of InfiniBand users, who already use 100Gbps InfiniBand, to jump on the new stuff, she said.