TVM: End-to-End Optimization Stack for Deep Learning

Chen, Tianqi, Moreau, Thierry, Jiang, Ziheng, Shen, Haichen, Yan, Eddie, Wang, Leyuan, Hu, Yuwei, Ceze, Luis, Guestrin, Carlos, Krishnamurthy, Arvind

Feb-12-2018–arXiv.org Artificial Intelligence

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current popularity and utility of deep learning. However, these frameworks are optimized for a narrow range of server-class GPUs and deploying workloads to other platforms such as mobile phones, embedded devices, and specialized accelerators (e.g., FPGAs, ASICs) requires laborious manual effort. We propose TVM, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. We discuss the optimization challenges specific to deep learning that TVM solves: high-level operator fusion, low-level memory reuse across threads, mapping to arbitrary hardware primitives, and memory latency hiding. Experimental results demonstrate that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art libraries for low-power CPU and server-class GPUs. We also demonstrate TVM's ability to target new hardware accelerator back-ends by targeting an FPGA-based generic deep learning accelerator. The compiler infrastructure is open sourced.

accelerator, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

Feb-12-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Industry:
- Information Technology (0.94)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found