Object Detection from 9 FPS to 650 FPS in 6 Steps
Making code run fast on GPUs requires a very different approach to making code run fast on CPUs because the hardware architecture is fundamentally different. If you come from a background of efficient coding on CPU then you'll have to adjust some assumptions about what patterns are best. Machine learning engineers of all kinds should care about squeezing performance from their models and hardware -- not just for production purposes, but also for research and training. In research as in development, a fast iteration loop leads to faster improvement. This article is a practical deep dive into making a specific deep learning model (Nvidia's SSD300) run fast on a powerful GPU server, but the general principles apply to all GPU programming.
Nov-2-2020, 07:06:58 GMT