SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning