This is a pretty active area of research, namely "edge device computing" which often intertwines with "model compression". Using embedded devices that have GPUs such as the Nvidia Jetson TX2 is often a good place to start. This way you can use a smaller GPU that offers CUDA support in an embedded setting. However you must make sure your models are small enough to fit on a device with compute limitations. Frameworks like Tensorflow can train models on a GPU and then you can save the weights, then perform inference elsewhere on a CPU, perhaps you can do something like this on a raspberry pi but keep in mind you will be severly limited on such a device.
Apr-6-2019, 10:50:47 GMT