Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention