High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures