Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators