Overcoming ML Data Preprocessing Bottlenecks With gRPC
One of the measures of the health of a deep learning project is the degree to which it utilizes the training resources that it was allocated. Whether you are training in the cloud or on your own private infrastructure, training resources cost money, and any block of time in which they are left idle represents a potential opportunity to increase training throughput and overall productivity. This is particularly true for the training accelerator -- typically the most expensive training resource -- whether it be a GPU, a Google TPU, or a Habana Gaudi. This blog is a sequel to a previous post on the topic of Overcoming Data Preprocessing Bottlenecks in which we addressed the undesired scenario in which your training accelerator, henceforth assumed to be a GPU, finds itself idle while it waits for data input from an overly tasked CPU. The post covered several different ways of addressing this type of bottleneck and demonstrated them on a toy example, all the while emphasizing that the best option would very much depend on the specifics of the model and project at hand.
Jul-24-2022, 15:55:09 GMT