Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers