Goto

Collaborating Authors

 framework overhead



Thank you for the insightful comments and the opportunity to follow up

Neural Information Processing Systems

Thank you for the insightful comments and the opportunity to follow up. PyTorch's native implementation) to Nimble and measure its performance. Note that TensorRT and TVM do not support training for now. Figure 1: Speedup compared to TensorRT on inference workloads (batch size 1) using V100. Figure 2: Speedup compared to Py-Torch on training using V100.


Single-GPU GNN Systems: Traps and Pitfalls

Gong, Yidong, Tarafder, Arnab, Afrin, Saima, Kumar, Pradeep

arXiv.org Artificial Intelligence

The current graph neural network (GNN) systems have established a clear trend of not showing training accuracy results, and directly or indirectly relying on smaller datasets for evaluations majorly. Our in-depth analysis shows that it leads to a chain of pitfalls in the system design and evaluation process, questioning the practicality of many of the proposed system optimizations, and affecting conclusions and lessons learned. We analyze many single-GPU systems and show the fundamental impact of these pitfalls. We further develop hypotheses, recommendations, and evaluation methodologies, and provide future directions. Finally, a new reference system is developed to establish a new line of optimizations rooted in solving the system-design pitfalls efficiently and practically. The proposed design can productively be integrated into prior works, thereby truly advancing the state-of-the-art.


The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment

Fernandez, Jared, Kahn, Jacob, Na, Clara, Bisk, Yonatan, Strubell, Emma

arXiv.org Artificial Intelligence

Increased focus on the computational efficiency of NLP systems has motivated the design of efficient model architectures and improvements to underlying hardware accelerators. However, the resulting increases in computational throughput and reductions in floating point operations have not directly translated to improvements in wall-clock inference latency. We demonstrate that these discrepancies can be largely attributed to bottlenecks introduced by deep learning frameworks. We denote this phenomenon as the \textit{framework tax}, and observe that the disparity is growing as hardware speed increases over time. In this work, we examine this phenomenon through a series of case studies analyzing the effects of model design decisions, framework paradigms, and hardware platforms on total model latency. Code is available at https://github.com/JaredFern/Framework-Tax.


PyTorch, a year in....

#artificialintelligence

Today marks 1 year since PyTorch was released publicly. It's been a wild ride -- our quest to build a flexible deep learning research platform. Over the last year, we've seen an amazing community of people using, contributing to and evangelizing PyTorch -- thank you for the love. Looking back, we wanted to summarize PyTorch over the past year: the progress, the news and highlights from the community. We've been blessed with a strong organic community of researchers and engineers who fell in love with PyTorch.