Goto

Collaborating Authors

 nimble


Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing Systems

Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead and unnecessary serial execution. To this end, we propose Nimble, a DL execution engine that runs GPU tasks in parallel with minimal scheduling overhead. Nimble introduces a novel technique called ahead-of-time (AoT) scheduling.


Supplementary Materials for Nimble: Lightweight and Parallel GPU T ask Scheduling for Deep Learning Appendix A Proofs on the Stream Assignment Algorithm of Nimble

Neural Information Processing Systems

In this section, we provide detailed proofs on the theorems presented in Section 4.2. We assume that the computation graph of a neural network is given. Here we define important concepts and terminologies used in the following proofs. F or any (u,v) E, f ( u) = f (v) or there exists a path P E from u to v such that P Λ null= . Prior to the proof of Theorem 1-2, we describe and prove Lemma 1 and Lemma 2. Lemma 1. We will prove by contradiction.




Supplementary Materials for Nimble: Lightweight and Parallel GPU T ask Scheduling for Deep Learning Appendix A Proofs on the Stream Assignment Algorithm of Nimble

Neural Information Processing Systems

In this section, we provide detailed proofs on the theorems presented in Section 4.2. We assume that the computation graph of a neural network is given. Here we define important concepts and terminologies used in the following proofs. F or any (u,v) E, f ( u) = f (v) or there exists a path P E from u to v such that P Λ null= . Prior to the proof of Theorem 1-2, we describe and prove Lemma 1 and Lemma 2. Lemma 1. We will prove by contradiction.



Thank you for the insightful comments and the opportunity to follow up

Neural Information Processing Systems

Thank you for the insightful comments and the opportunity to follow up. PyTorch's native implementation) to Nimble and measure its performance. Note that TensorRT and TVM do not support training for now. Figure 1: Speedup compared to TensorRT on inference workloads (batch size 1) using V100. Figure 2: Speedup compared to Py-Torch on training using V100.


Review for NeurIPS paper: Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing Systems

Weaknesses: This work is most applicable on networks with many small kernels, which may not be of broad interest in all cases. Nonetheless, it does help with training MobileNet and similar networks on desktop or server GPUs. I also feel that some parts of the paper overstate the contribution, either by only evaluating on these networks or by leaving out some optimized baselines. The biggest issues here are: - For inference, you should compare against an optimized inference runtime such as TensorRT. This will likely do better than PyTorch or Caffe2 do out of the box, even with TorchScript.


Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing Systems

Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead and unnecessary serial execution. To this end, we propose Nimble, a DL execution engine that runs GPU tasks in parallel with minimal scheduling overhead. Nimble introduces a novel technique called ahead-of-time (AoT) scheduling. Evaluation on a variety of neural networks shows that compared to PyTorch, Nimble speeds up inference and training by up to 22.34 and 3.61, respectively.


The biggest threat your nail salon has ever seen

FOX News

Nimble helps avoid the nail salon. Nail salons everywhere may soon face a serious competitor: Nimble, the robot manicurist. The company calls it the world's first smart home nail salon. It is a revolutionary device that lets you get a flawless manicure at home without any hassle. Nimble uses patented pioneering technology to scan, paint and dry your nails with one game-changing device.