Faster Neighborhood Attention: Reducing the O (n 2) Cost of Self Attention at the Threadblock Level