NNPACK - acceleration package for neural networks on multi-core CPUs • /r/MachineLearning
I'm guessing this library will get the most use in inference. For that you'll want to implement the 2x2 and 4x4 winograd transforms. These can break a single input image into enough tiles to get you decent batched gemm performance. The smaller tiles have much higher accuracy as well, and even work fine in fp16 or int16. Also I suspect your muti-image winograd implementation can be optimized quite a bit more than it is.
Mar-25-2016, 03:00:29 GMT
- Technology: