Deep learning processing unit delivers 135 GOPS/W on midrange FPGAs